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REAL PARTY IN INTEREST 

The real party in interest is STMicroelectronics Asia Pacific (PTE) Limited, which 
is the assignee of the present invention. The assignment of record is to STMicroelectronics Asia 
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Pacific PTE Limited, having an address at 28 Ang Mo Kio Industrial Park 2, Singapore, 569508 
Singapore. 

II. RELATED APPEALS AND INTERFERENCES 

There are no other appeals, interferences or judicial proceedings which directly 
affect or will be directly affected by or have a bearing on the Board's decision in this appeal. This 
application is a conversion of PCT Application No. PCT/SG97/00037 filed August 29, 1997, into a 
U.S. National Application. 

III. STATUS OF CLAIMS 

Claims 1-20 are currently pending in this application. All pending active claims are 
attached hereto as Appendix A. 

Claims 1-6, 11, 14 and 16-19 were rejected under 35 U.S.C. § 103(a) as being 
unpatentable over deSousa (European Patent AppHcation 0 564 089 Al) in view of Uramoto 
(European Patent Application 0 506 111 A2). 

Claims 8-10 and 20 were rejected under 35 U.S.C. § 103(a) as being unpatentable 
over Uramoto in view of ISO Standard 111 72-3, 

Claims 7, 12-13 and 15 were rejected under 35 U.S.C. § 103(a) as being 
unpatentable over de Sousa in view of Uramoto and in further view of ISO Standard 111 72-3, 

The rejection of claims 1-20 is appealed. 

IV. STATUS OF AMENDMENTS 

No amendments were filed subsequent to the final rejection. 

V. SUMMARY OF CLAIMED SUBJECT MATTER 

The following sununary discusses the subject matter of the independent appealed 
claims along with references to portions of the specification and drawings that provide support 
for the claims. The references are provided for exemplary purposes and are not intended to 
restrict the scope of the claims to the particular embodiments corresponding to the references 
provided. 

Embodiments of the invention are directed to enhanced synthesis sub-band 
filtering during decoding of digital audio signals. Embodiments decode, for example, MPEG 1 
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layer 2 encoded audio. An Inverse Modified Discrete Cosine Transform (IMDCT) is 
implemented using addition/subtraction followed by multiplication. 

Specifically, independent claim 1 is directed to a method of decoding digital 
audio data comprising the steps of obtaining an input sequence of data elements representing 
encoded audio samples, preprocessing the input sequence of data elements by calculating an array 
of sum data and an array of difference data using selected data elements fi-om the input sequence, 
calculating a first sequence of output values using the array of sum data, calculating a second 
sequence of output values using the array of difference data and forming decoded audio 
signals from the first and second sequences of output values. Specification references are to 
the published PCX application. Support for independent claim 1 can be foxmd in the last 
paragraph on page 2. More detail in the form of examples is provided on pages 2-12 and Figures 
3-5. 

Independent claim 8 is directed to a method of decoding a sequence of m, m an even 
positive integer, input digital audio data samples S[k], where k = 0, 1, ... (m-1), to produce a set of 
n, n an even positive integer, output audio data samples V[i], where i = 0, 1, ...(n-1), comprising 
the steps of: 

a) calculating an array of sum data SaddM according to 

SaddM = S[k] H- S[m-l-k] for k = 0, 1, ...(m/2-1) 

b) calculating an array of difference data SsuB[k] according to 

SsubM = S[k] - S(m-l-k) for k - 0, 1, ...(m/2-1) 

c) calculating a first output audio data sample by a multiply-accumulate 

operation according to 

V[2i] = V[2i] + N[i, k]*SADD[k] for k = 0, 1, ... (m/2-1) 

u xTf ui r (32 + 2/X2A: + l>r 

where N[i, k] = cos — 

L 64 

d) calculating a second output audio data sample by a multiply-accumulate 

operation according to 

V[2i+1] = V[2i+1] -f N[i, k]*SsuB[k] fork = 0, 1, ... (m/2-1) 

"(32 + (2/-H)X2fc-f l);r 



Where N[l,k] = cos 



64 
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e) and repeating steps c) and d) for i = 0, 1, ... (n/2-1) to obtain a full set of 

output data. 

Support for claim 8 can be found in the last paragraph beginning on page 3 and 
continuing to page 4. 

Independent claims 11 and 14 include means plus function elements. According 
to 37CFR 41.37(c)(l)(v), such means plus function elements "must be identified and the 
structure, material, or acts described in the specification as corresponding to each claimed 
function must be set forth with reference to the specification by page and line number, and to the 
drawing, if any, by reference characters." Accordingly, the following shows claims 11 and 14 
together with the required information in parentheses. 

11. A synthesis sub-band filter for use in decoding digital audio data, 

comprising: 

means for receiving or retrieving an input sequence of data elements comprising 
encoded digital audio data; (Page 4, first full paragraph; Fig. 2, input to audio decoder 
circuit 20 and the description thereof on page 5, first full paragraph; Figure 3, step 44; 
Figure 4, step 84; Figure 5, step 104) 

pre-calculation means for calculating an array of sxim data and an array of 
difference data using selected data elements from the input sequence; (Page 4, first full 
paragraph; Figure 2, audio decoder circuit 20 and the description thereof on page 5; 
Figure 3, step 52; Figure 4, step 86; Figure 5, steps 106, 108, 110) 

transform calculation means for calculating a first sequence of decoded output 
values using said array of sum data and a second sequence of decoded output values using said 
array of difference data (Page 4, first full paragraph; Figure 2, audio decoder circuit 20 and 
the description thereof on page 5; Figure 3, step 52; Figure 4, steps 88, 90, 92; Figure 5, 
steps 112, 114, 116, 118, 120, 122). 
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1 4. An MPEG decoder comprising: 

means for receiving an input sequence of data elements comprising encoded 
digital audio data; (Page 4, first full paragraph; Fig. 2, input to audio decoder circuit 20 and 
the description thereof on page 5, first full paragraph; Figure 3, step 44; Figure 4, step 84; 
Figure 5, step 104) 

means for calculating an array of sum data and an array of difference data using 
selected data elements from the input sequence; and (Page 4, first full paragraph; Figure 2, 
audio decoder circuit 20 and the description thereof on page 5; Figure 3, step 52; Figure 4, 
step 86; Figure 5, steps 106, 108, 110) 

means for calculating a first sequence of decoded output values using said array 
of sum data and a second sequence of decoded output values using said array of difference data 
(Page 4, first full paragraph; Figure 2, audio decoder circuit 20 and the description thereof 
on page 5; Figure 3, step 52; Figure 4, steps 88, 90, 92; Figure 5, steps 112, 114, 116, 118, 
120, 122). 

VL ISSUES 

1. Whether claims 1-6, 11, 14 and 16-19 are unpatentable over deSousa 
(European Patent Application 0 564 089 Al) in view of Uramoto (European Patent Application 
0 506 111 A2). 

2. Whether claims 8-10 and 20 are unpatentable over Uramoto in view of ISO 
Standard 11172-3. 

3. Whether claims 7, 12-13 and 15 are unpatentable over de Sousa in view of 
Uramoto and in further view of ISO Standard 11172-3, 
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VII. ARGUMENT 

A. Introduction 

The Federal Circuit has held many times that the Examiner must provide 
objective evidence of a motivation for combining the teachings of cited references in the manner 
claimed. E.g., In re Sang-Su Lee, 277 F.3d 1338, 1343; 61 USPQ2d 1430, 1433 (Fed. Cir. 
2002). Further, "this factual question of motivation is material to patentability, and could not be 
resolved on subjective behef and unknown authority." Id at 277 F.3d 1343-1344; 61 USPQ2d 
1433. Moreover, "the mere fact that the prior art may be modified in the manner suggested by 
the Examiner does not make the modification obvious unless the prior art suggested the 
desirability of the modification." In re Fritch, 972 F.2d 1260, 1266; 23 USPQ2d 1780, 1783-84 
(Fed. Cir. 1992). 

The Examiner initially bears the burden of establishing a prima facie case of 
obviousness. In re Bell, 26 U.S.P.Q.2d 1529 (Fed. Cir. 1993); In re Oetiker, 977 F.2d 1443, 
1445, 24 U.S.P.Q.2d 1443, 1444 (Fed. Cir. 1992); In re Piasecki, 745 F.2d 1468, 1472, 223 
U.S.P.Q. 785, 788 (Fed. Cir. 1984); MPEP § 2142. An Applicant may attack an obviousness 
rejection by showing that the Examiner has failed to properly establish a prima facie case or by 
presenting evidence tending to support a conclusion of non-obviousness. In re Fritch, 972 F.2d 
at 1265, 

In order for an examiner to establish a prima facie case that an invention, as 
defined by a claim at issue, is obvious the examiner must: (1) show some suggestion or 
motivation, either in the references themselves or in the knowledge generally available to one of 
ordinary skill in the art, to modify the reference or combine the reference teachings; (2) there 
must be a reasonable expectation of success; and (3) the prior art reference (or the combined 
references) must teach or suggest all the claim limitations. MPEP § 2142. "The teaching or 
suggestion to make the claimed combination and the reasonable expectation of success must both 
be found in the prior art, not in applicant's disclosure." MPEP § 2143. The level of skill in the 
art cannot be relied upon to provide the suggestion to combine the references. MPEP § 2143.01 
{citing Al-Site Corp. v. VSI Int'l Inc., 174 F.3d 1308, 50 U.S.P.Q.2d 1161 (Fed. Cir. 1999). The 
mere fact that the references c^ be combined or modified does not render the resultant 
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combination obvious unless the prior art also suggests the desirability of the combination. 
MPEP § 2143.01 (citing /n re Mills, 916 F.2d 680, 16 U.SP.Q. 2d 1430 (Fed. Cir. 1990). 

Moreover, a reference must be viewed as a whole, including portions that would 
lead away from the claimed invention. MPEP § 2141.03 (citing W.L. Gore & Assoc., Inc. v. 
Garloch Inc, 111 F.2d 1540, 220 U.S.P.Q. 303 (Fed. Cir. 1983). If the proposed modification 
would change the principles of operation of the prior art invention being modified, then the 
teachings of the references are not sufficient to render the claims prima facie obvious. MPEP § 
2143.01 (citing /n re Ratti, 270 F.2d 810, 123 U.S.P.Q. 349 (CCPA 1959)). 

B. The Examiner Has Failed to Estabiisli a Prima Facie Case of 
Obviousness 

1 . Claims 1-6,11.14 and 1 6-1 9 are not obvious over de Sousa in 
view of Uramoto 

The Examiner rejected claims 1-6, 11, 14 and 16-19 under 35 U.S.C. § 103(a) as 
being unpatentable over de Sousa (European Patent Application 0 564 089 Al) in view of 
Uramoto (European Patent Application 0 506 1 1 1 A2). The Examiner has failed to establish a 
prima facie case of obviousness. Specifically, the Examiner has failed to show that the 
combination of de Sousa and Uramoto teaches, suggests or motivates the claimed invention. 
Moreover, Uramoto teaches away fi"om the claimed invention. 

Claim 1 recites, in part "[a] method of decoding digital audio data, comprising 
the steps of . . . preprocessing the input sequence of data elements by calculating an array of sum 
data and an array of difference data using selected data elements from the input sequence ..." 
(emphasis added). Similarly, claim 1 1 recites, in part, "[a] synthesis sub-band filter for use in 
decoding digital audio data, comprising ... pre-calculation means for calculating an array of sum 
data and an array of difference data using selected data elements from the input sequence ..." 
(emphasis added). Claim 14 recites, in part, "[an] MPEG decoder comprising ... means for 
calculating an array of sum data and an array of difference data using selected data elements 
from the input sequence ..." (emphasis added). 

The portions of de Sousa and Uramoto to which the Examiner points do not teach 
or suggest a method of decoding digital audio data, as recited. To the extent either addresses 
decoding, a different method is taught. Specifically, de Sousa does not address decoding except 
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in a conclusory form (Le,, de Sousa in one paragraph notes that the "decoder has a very simple 
structure" but provides no details on the structure of the decoder or the methods employed). See 
Figure 12 of de Sousa and the terse description thereof on page 17, lines 34-39. Further, de 
Sousa does not disclose the way the processing of the input data is performed. 

The portion of Uramoto to which the Examiner points teaches using the discrete 
cosine transform (DCT) for encoding. See Figure 5 of Uramoto and the accompanying 
description thereof on page 8, lines 15-37. Uramoto teaches using the inverse discrete cosine 
transform (IDCT) for decoding, which teaches post-processing "a sum and a difference between 
intermediate data." In other words, intermediate multiplication of the input occurs and it is the 
intermediate data that is subjected to addition and subtraction. See, e.g., the description of 
Figure 11 of Uramoto and the accompanying description thereof on page 10, line 48 through 
page 12, line 22. Accordingly, Uramoto teaches away from the claimed invention. Thus, the 
combination of de Sousa and Uramoto does not teach or suggest decoding digital audio data by 
"preprocessing the input sequence of data elements by calculating an array of sum data and an 
array of difference data calculating a first sequence of output values using the array of sum 
data; calculating a second sequence of output values using the array of difference data; and 
forming decoded audio signals from the first and second sequences of output values" as 
recited. In fact, Uramoto teaches away from the claimed invention. 

With regard to the decoding operation of Uramoto, the Examiner stated that 
Uramoto discloses a processing unit operable in a decoding application in which the processing 
unit is "in its same form as the processing unit disclosed in Fig. 5." More specifically, and in 
reference to Fig. 11, Uramoto states "[pjostprocessing section 7 has the same configuration as 
that of Fig. 5 or 6" (page 12, line 24). Although Uramoto discloses a postprocessing section 7 
(Fig. 1 1) that has the same configuration as preprocessing section 1 (Figs. 4 or 5), postprocessing 
section 7 does not "calculat[e] an array of sum data and an array of difference data using selected 
data elements from the input sequence," where the input sequence is an "input sequence of data 
elements representing encoded audio samples," as claimed (emphasis added). 

In reference to the postprocessing section 7 of the IDCT processor (Fig. 11) 
having the same configuration as the preprocessing section 1 of the DCT processor (Fig. 4), 
Uramoto states "input circuit 21 sequentially or alternately receives intermediate terms Mi (i = 0 
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to 3), Ni (i = 0 to 3) to apply a desired combination of the terms to adder/subtractors 22, 23 (or 
26)" (page 12, lines 24-26). That is, the postprocessing section 7 operates on intermediate terms 
to generate output data xi that is either a sum of intermediate terms (Mi and Ni) or a difference of 
intermediate terms, based upon the value of the integer i (page 12, lines 17-22). However, 
postprocessing section 7 does not operate on selected data elements from the input sequence to 
generate sum and difference data, where the selected data elements represent encoded audio 
samples. In other words, although postprocessing section 7 does operate as preprocessing 
section 1 to generate sum or difference data, postprocessing xrnit 7 does not generate an array of 
sxmi data and an array of difference data using selected data elements from the input sequence, as 
claimed. 

Specifically, Uramoto discloses that x2 = M2 + N2 = A yO - C y2 - A y4 + B y6 + 
F yl - D y3 + G y[5] + E y7 (page 11, expression 13 and page 12, lines 7-20). That is, the sum 
output data generated by Uramoto (i.e., xO, xl, x2 or x3) is not comprised of "selected data 
elements from the input sequence," as claimed. Instead, Uramoto generates an output x2, for 
example, that comprises additions and subtractions of products of input data (yO, yl, y2, y3, y4, 
y5, y6, y7) and elements (A, B, C, D, E, F, G) of a coefficient matrix (expression 13, page 1 1). 

The Examiner's position appears to be that the combination of de Sousa and 
Uramoto could be further modified to achieve the claimed invention. The mere fact that 
references could be further modified is insufficient to establish obviousness, and the Examiner 
cites no motivation for this proposed further modification other than alleged skill in the art. 
Moreover, if the combination were further modified as the Examiner appears to suggest, the 
combination would not operate in accordance with the principles of operation of the decoder of 
Uramoto, which teaches an IDCT for decoding. Thus, the combination cannot be considered 
obvious. 

Accordingly, de Sousa and Uramoto do not teach, suggest, or motivate, nor has 
the Examiner shown, decoding using "selected data elements from the input sequence" to 
generate eitiier an array of sum data or an array of difference data, as claimed. Based at least 
upon the above arguments. Applicants respectfully submit that claims 1, 11 and 14 are not 
obvious over de Sousa in view of Uramoto. 
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Furthermore, since claims 2-6 and 18-19 depend either directly or indirectly from 
claim 1, and claims 16-17 depend from claim 14, Applicants submit that claims 2-6, 18-19 and 
claims 16-17 are allowable based at least upon the reasons given above in conjunction with 
claims 1,11 and 14. 

2. Claims 8-10 and 20 are not obvious over Uramoto in view of ISO 
Standard 11172-3 

The Examiner rejected claims 8-10 and 20 imder 35 U.S.C. § 103(a) as being 
unpatentable over Uramoto in view of ISO Standard 11172-3. The Examiner has failed to 
establish a prima facie case of obviousness. 

Claim 8 recites: "[a] method of decoding ... input digital audio data samples ... 
comprising the steps of: ... calculating an array of sum data ... [;] calculating an array of 
difference data ... [;] calculating a first output audio data sample by a multiply-accumulate 
operation ... [;] calculating a second output audio data sample by a multiply-accumulate operation 
according." The Examiner again points to the description of Figure 5 of Uramoto, which 
describes an encoder. As discussed above, Uramoto teaches away from the claimed invention by 
describing the use of a different method of decoding. See, e.g., the description of Figure 1 1 of 
Uramoto. Further, one would not be motivated to combine the inverse modified discrete cosine 
transform (IMDCT) with Uramoto, which as discussed above teaches the DCT for encoding and 
the IDCT for decoding. 

Further, the sum output data xi = Mi + Ni for i = 0, 1, 2, 3 and the difference 
output data xi = Mi - Ni for i = 4, 5, 6, 7 generated by postprocessing section 7 (Uramoto, page 
12, lines 20-22 and Fig. 1 1) is not the same as the array of sum data SADD[k] = S[k] + S[m-l-k] 
and the array of difference data SSUB[k] = S[k] - S(m-l-k) (for k = 0, 1, ...(m/2-1)), as claimed. 
Uramoto discloses Mi to be an intermediate term comprised of additions and/or subtractions of 
products of input data (yO, y2, y4, y6) with coefficients A, B and C, and Ni to be an intermediate 
term comprised of additions and/or subtractions of products of input data (yl, y3, y5, y7) with 
coefficients D, E, F and G (page 11, expression 13 to page 12, line 22). In contrast, S[k] and 
S[m-l-k] are coded input digital audio data samples. In other words, it is clear that xi = Mi Ni 
does not equal either SADD[k] or SSUB[k], since S[k] does not equal Mi and S[m-l-k] does not 
equal Ni. 
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Based at least upon the above arguments, Applicants submit that claim 8 is not 
obvious over Uramoto in view of ISO Standard 1 1 172-3. Furthermore, since claims 9-10 and 20 
depend either directly or indirectly from claim 8, Applicants submit that claims 9-10 and 20 are 
allowable for at least the same reasons given above in conjunction with claim 8. 

3. Claims 7, 12-13 and 15 are not obvious over de Sousa in view of 
Uramoto and ISO Standard 11172-3 

The Examiner rejected claims 7, 12-13 and 15 under 35 U.S.C. § 103(a) as being 
unpatentable over de Sousa in view of Uramoto and in further view of ISO Standard 1 1 172-3. 
The Examiner has failed to establish a prima facie case of obviousness. 

Neither de Sousa nor ISO Standard 11172-3 remedy the deficiencies of Uramoto 
as discussed above in conjunction with claims 1, 11 and 14. Thus, Applicants respectfully 
submit that since claim 7 depends from claim 1, claims 12-13 depend either directly or indirectly 
from claim 11, and claim 15 depends from claim 14, claims 7, 12-13 and 15 are allowable based 
at least upon the reasons given above in conjunction with claims 1,11 and 14, respectively, and 
request that claims 7, 12-13 and 15 be allowed. 

VIIL CLAIMS APPENDIX 

A copy of the claims as currently pending is attached hereto as Appendix A. 

IX. EVIDENCE APPENDIX 

The Final Office Action mailed on June 20, 2005, and referred to herein is 

attached hereto as Appendix B. 

A copy of the references cited by the Examiner and referred to herein is attached 
hereto as Appendix C. The references were cited in an Office Action mailed on June 20, 2005. 



11 



CONCLUSION 



The Examiner has failed to estabhsh a prima facie case that the claims are rendered 



unpatentable over de Sousa, whether considered alone or in combination with Uramoto and/or 
ISO Standard 11172-3, Moreover, Uramoto teaches away from the claims, and modifying 
Uramoto as suggested by the Examiner would change the principles of operation of Uramoto. 
Accordingly, allowance of the claims is respectfully requested. 
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APPENDIX A 
CLAIMS INVOLVED IN THE APPEAL 



1 . (Previously Presented) A method of decoding digital audio data, 
comprising the steps of: 

obtaining an input sequence of data elements representing encoded audio samples; 
preprocessing the input sequence of data elements by calculating an array of sum 
data and an array of difference data using selected data elements from the input sequence; 

calculating a first sequence of output values using the array of sum data; 
calculating a second sequence of output values using the array of difference 

data; and 

forming decoded audio signals from the first and second sequences of output 

values. 

2. (Previously Presented) A method as claimed in claim 1 wherein the 
array of sum data is obtained by adding together respective first and second data elements from the 
input sequence, the first and second data elements being selected from mutually exclusive sub- 
sequences of the input sequence. 

3 . (Previously Presented) A method as claimed in claim 1 wherein the 
array of difference data is obtained by subtracting respective first data elements from corresponding 
second data elements of the input sequence, the first and second data elements being selected from 
mutually exclusive sub-sequences of the input sequence. 

4. (Previously Presented) A method as claimed in claim 1 wherein 
the step of calculating an array of sum data and an array of difference data comprises: 
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dividing the input data sequence into first and second equal sized sub-sequences, 
the first sub-sequence comprising the high order data elements of the input sequence and the 
second sub-sequence comprising the low order data elements of the input sequence; 

calculating the array of sum data by adding together each respective data element 
of the first sub-sequence with a respective corresponding data element of the second sub-sequence; 
and 

calculating the array of difference data by subtracting each respective data element 
of the first sub-sequence firom a respective corresponding data element of the second sub-sequence. 

5. (Previously Presented) A method as claimed in claim 1 wherein 
the step of calculating a first sequence of output values comprises performing a multiply- 
accumulate operation utiUzing each of the sum data elements. 

6. (Previously Presented) A method as claimed in claim 1, wherein the step 
of calculating a second sequence of output values comprises performing a multiply-accumulate 
operation utiUzing each of the difference data elements. 

7. (Previously Presented) A method as claimed in claim 1 wherein the 
input sequence of data elements is derived fi-om MPEG encoded audio data, and wherein the decoded 
audio signals comprise pulse code modulation samples. 

8. (Original) A method of decoding a sequence of m, m an even positive 
integer, input digital audio data samples S[k], where k = 0, 1, ... (m-1), to produce a set of n, n an 
even positive integer, output audio data samples V[i], where i = 0, 1, ...(n-1), comprising the steps 
of: 

c) calculating an array of sum data SaddM according to 

SaddM = S[k] + S[m-l-k] for k = 0, 1, ...(m/2-1) 

d) calculating an array of difference data SsuB[k] according to 
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SsubM = S[k] - S(m-l-k) for k = 0, 1, ...(m/2-1) 

c) calculating a first output audio data sample by a multiply-accumulate 

operation according to 

V[2i] - V[2i] + N[i, k]*SADD[k] for k - 0, 1, ... (m/2-1) 

u xTf 1 1 r (32 + 2/X2^ + l>r 

where N[i, k] = cos — 

L 64 

d) calculating a second output audio data sample by a multiply-accumulate 

operation according to 

V[2i+1] = V[2i+1] +N[i, k]*SsuB[k] fork = 0, 1, ... (m/2-1) 

.rn in f (32 + (2/ + l)X2A: + iW ^ 

Where N[ 1 , k] = cos -^^ ^-^ ^— 

L 64 

e) and repeating steps c) and d) for i = 0, 1, ... (n/2-1) to obtain a fiill set of 

output data. 

9. (Original) A method as claimed in claim 8, wherein the number m of 
input digital audio data samples is 32, and the number n of output audio data samples is 32. 

10. (Previously Presented) A method as claimed in claim 8 wherein 
the decoding steps are repeated for decoding a series of frames of encoded audio data in an 
MPEG format. 

1 1 . (Previously Presented) A synthesis sub-band filter for use in 
decoding digital audio data, comprising: 

means for receiving or retrieving an input sequence of data elements comprising 
encoded digital audio data; 

pre-calculation means for calculating an array of sum data and an array of 
difference data using selected data elements from the input sequence; and 
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transform calculation means for calculating a first sequence of decoded output 
values using said array of sum data and a second sequence of decoded output values using said 
array of difference data. 

12. (Original) A synthesis sub-band filter as claimed in claim 11 
wherein the pre-calculation means and transform calculation means collectively perform an 
inverse modified discrete cosine transform of the encoded digital audio data. 

13. (Previously Presented) An MPEG decoder including a synthesis 
sub-band filter as claimed in claim 12. 

14. (Previously Presented) An MPEG decoder comprising: 

means for receiving an input sequence of data elements comprising encoded 
digital audio data; 

means for calculating an array of sum data and an array of difference data using 
selected data elements from the input sequence; and 

means for calculating a first sequence of decoded output values using said array 
of sum data and a second sequence of decoded output values using said array of difference data. 

15. (Previously Presented) The MPEG decoder of claim 14 wherein the 
means for receiving an input sequence comprises a bitstream xmpacking and decoding circuit. 

16. (Previously Presented) The MPEG decoder of claim 14 wherein the 
means for calculating an array of sum data and an array of difference data comprises a 
reconstruction circuit. 
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17. (Previously Presented) The MPEG decoder of claim 14 wherein the 
means for calculating a first sequence of decoded output values comprises an inverse mapping 
circuit. 

18. (Previously Presented) The method of claim 2 wherein the array of 
difference data is obtained by subtracting respective first data elements firom corresponding second 
data elements of the input sequence, the first and second data elements being selected fi'om 
mutually exclusive sub-sequences of the input sequence. 

19. (Previously Presented) The method of claim 5 wherein the step of 
calculating a second sequence of output values comprises performing a multiply-accumulate 
operation utilizing each of the difference data elements. 

20. (Previously Presented) The method of claim 9 wherein the 
decoding steps are repeated for decoding a series of fi*ames of encoded audio data in an 
MPEG format. 
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Application/Control Number: 09/486,582 Page 2 

Art Unit: 2644 

DETAILED ACTION 
Response to Arguments 

Applicant's arguments filed 24 May 2005 have been fully considered but they are 
not persuasive. 

In regards to claims 1 and 1 1 , Applicant states regarding the de Sousa art: 

"de Sousa does not address decoding except in a conclusory form 
(i.e., de Sousa in one paragraph notes that the 'decoder has a very simple 
structure' but provides no details on the structure of the decoder or the 
methods employed)." 

Examiner agrees with the above statements. However, the purpose of the 
addition of the Uramoto reference is to address the level of detail of the decoding. This 
is further referenced in the rejection of claim 1 in the previous action dated 24 February 
2005 and acknowledged by the applicant within the arguments above. 

Further, Applicant argues the portion of the Uramoto reference to which the 
previous action points to teaches using the OCT for encoding as opposed to decoding 
and that the method Uramoto teaches for decoding is different than applicants claimed 
invention. 

However, in the previous reference the elements referred to in Uramoto are 
directed to a processing unit, regardless of its use in encoding or decoding. This 
processing unit is operable in a decoding application as is further evidenced by line 24 
of page 12 in its same form as the processing unit disclosed in Fig. 5 to which the 
previous action refers to. As such, the processing section, as referenced in the 
previous action, may be used in the decoding of OCT encoded signals and reads upon 
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the claimed limitations of the application. Therefore the argument is not persuasive and 
the previous rejection stands. 

In regards to claim 8, Applicant again states the portion of the Uramoto 
reference to which the previous action points to teaches using the DCT for encoding as 
opposed to decoding and that the method Uramoto teaches for decoding is different 
than applicants claimed invention. / 

For the same reasons as stated above regarding the arguments for claims 1 and 
1 1 this argument is not persuasive. 

Further applicant states one would not be motivated to combine the IMDCT with 
Uramoto. However, Examiner disagrees. Applicant has not given sufficient evidence 
explaining why one would not have been motivated to do so, applicant has only made a 
conclusory statement in which Uramoto teaches DCT and IDCT for encoding and 
decoding respectively. But, as stated in the previous rejection, using the IMDCT is one 
of the many implementations that one of ordinary skill in the art would have been 
motivated to use. As such the rejection stands. 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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Claims 1 - 6, 11, 14 and 16 - 19 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over de Sousa (European Patent Application 0 564 089 A1) in view of 
Uramoto (European Patent Application 0 506 1 1 1 A2). 

Regarding Claims 1, 11 and 14, Sousa discloses a method of decoding digital 
audio data (page 2, lines 11-1 3), a step of obtaining an input sequence of data elements 
representing encoded audio samples (page 6 lines 51 - 53), preprocessing thie input 
sequence of data elements (page 17 lines 32 - 39 and fig 12), performing a modified 
discrete cosine transform (abstract) and forming decoded audio signals (page 17 lines 
32 - 39). Sousa does not disclose the way the processing of the input data is 
performed. Uramoto disclose a data processing method for video data (page 8, lines 15 
- 37 and fig 5), method steps of calculating an array of sum data (page 8 lines 27 - 30), 
an array of difference data (page 8 line 31 ), calculating a first sequence of output values 
using the array of sum data (page 8 lines 32 - 37), calculating a second sequence of 
output values using the array of difference data (page 8 lines 32 - 37). It would have 
been obvious to one of ordinary skill in the art at the time of the Invention namely when 
the same result is to be achieved; i.e. to reduce the amount of processing required for • 
decoding, to apply the features of Uramoto to Sousa thereby arriving at a method 
according to claim 1 . 

Regarding Claim 2, in addition to the elements stated above regarding claim 1, 
Uramoto further discloses the input circuit receives sequentially output sets of data (xO, 
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x7) and then adder 22 adds the data i.e. (xO + x7) (page 8 lines 27 - 29) (i.e. wherein 
the array of sum data is obtained by adding together respective first and second data 
elements from the input sequence, the first and second data elements being selected 
from mutually exclusive sub-sequences of the Input sequence); 

Regarding Claim 3, in addition to the elements stated above regarding claim 1 , 
Uramoto further disclose the input circuit receives sequentrally output sets of data (xO, 
x7) and then subtracter 23 subtracts the data (xO - x7) (page 8 lines 27 - 31 ) (i.e. 
wherein the array of difference data is obtained by subtracting respective first data 
elements from corresponding second data elements of the input sequence, the first and 
second data elements being selected from mutually exclusive subsequences of the 
input sequence). 

Regarding Claim 4, in addition to the elements stated above regarding claim 1 , 
Uramoto discloses dividing X into (xO, x7), the first and last elements of X (page 8 lines 
27 - 28) (i.e. wherein the step of calculating an array of sum data and an array of 
difference data comprises dividing the input data sequence into first and second equal 
sized sub-sequences, the first sub-sequence comprising the higher order data elements 
of the input sequence and the second sub-sequence comprising the low order data 
elements of the input sequence), adder 22 adds the data i.e. (xO + x7) (page 8 lines 27 
- 29) (i.e. calculating the array of sum data by adding together each respective data 
element of the first subsequence with a respective corresponding data element of the 
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second sub-sequence) and then subtracter 23 subtracts the data (xO - x7) (page 8 lines 
27-31) (i.e. and calculating the array of difference data by subtracting each respective 
data element of the first sub-sequence from a respective corresponding data element of 
the second sub-sequence). 

Regarding Claim 5, in addition to the elements stated above regarding claim 1 , 
the output of the addition and subtraction (fig; 5 element 500/ is applied to a data 
rearranging circuit which supplies an output (fig.7A elements 500 and 501 ), this output 
is then applied to a product sum operation circuit (fig. 8 element 501 ) (i.e. wherein the 
step of calculating a first sequence of output values comprises performing a multiply- 
accumulate operation utilizing each of the sum data elements). 

Regarding Claim 6, in addition to the elements stated above regarding claim 1 , 
the output of the addition and subtraction (fig. 5 element 500) is applied to a data 
rearranging circuit which supplies an output (fig 7A elements 500 and 501), this output 
is then applied to a product sum operation circuit (fig. 8 element 501 ) (i.e. wherein the 
step of calculating a second.sequence of output values comprises performing a 
multiply-accumulate operation utilizing each of the difference data elements). 

Regarding Claim 16, in addition to the elements stated above regarding claim 
14, Uramoto further discloses; 



) 
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wherein the means for calculating an array of sum data and an array of 
difference data comprises a reconstruction circuit (i.e. the sum and difference 
operations are part of a processing circuit; pages 8 and 12). 

Regarding Claim 17, in addition to the elements stated above regarding claim 
14, Uramoto further discloses: 

Wherein the means for calculating a first sequence of decoded output values 
comprises an inverse mapping circuit (i.e. the output circuit outputs the addition and 
subtraction data; page 8). 

Regarding Claim 18, in addition to the elements stated above regarding claim 2, 
Uramoto further discloses: 

wherein the array of difference data is obtained by subtracting respective first 
data elements from corresponding second data elements of the input sequence, the first 
and second data elements being selected from mutually exclusive subsequences of the 
input sequence (i.e. the subtactor selects the set of xO and x7 to create a difference 
value from the sets of data of (xO. x7), (x1,-x6), (x2, x5) and (x3, x4)). 

Regarding Claim 19, in addition to the elements stated above regarding claim 1, 
the output of the addition and subtraction (fig. 5 element 500) is applied to a data 
rearranging circuit which supplies an output (fig 7A elements 500 and 501 ), this output 
is then applied to a product sum operation circuit (fig. 8 element 501 ) (i.e. wherein the 
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step of calculating a first sequence of output values comprises performing a multiply- 
accumulate operation utilizing each of the sum data elements). 



Claims 8-10 and 20 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Uramoto (European Patent Application 0 506 1 1 1 A2) in view of ISO 
Standard 11172-3. 

Regarding Claim 8, Uramoto discloses adder 22 adds the data i.e. (xO + x7) 
(page 8 lines 27 - 29) and subtracter 23 subtracts the data (xO - x7) (page 8 lines 27 - 

21 J j 0 a) calculadng an array of sum data S^o[k] atxordiiig to 

SxddM ^ SM -f Slnhl-kl for k =^ 0, 1, .,.(m/2-l) 

b) calciilatiiig an array of difference data SsubPc] accordmg to 

SsuaM = S[kJ - S[m-l-lc] for k - 0, 1, .(m/2-1) 

Uramoto does not disclose the rest of the claimed limitations in claim 8. ISO discloses 
an inverse modified discrete cosine transform (page 36). ISO also discloses multiplying 
samples by this function (page 41) i.e. 

c) ealculating a first output audio data sample by a muitiply-accunmiate operation 
» according to 

VI2f| = V[2il + Nli, kl^SxDoM fork=0, 1,,.. (in/2-1) 



where N[i, kj = col^^ ^ 
d) calculating, a second output audio data sample 



by a muliiply-accumulate operation 



according to 

V[2i+ll - V[2i+I] + Nfi. W^Ssualk] for k = 0, 1, ... {mn^l) 

where N[i. k] - cos|-^^ ^ ' 

e) and repeating steps c) and d) 



or i = 0, 1, ... (n/2-.l) to obtain a full set of output data. 
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It would have been obvious to one of ordinary skill in the art at the time of the invention 
to use Uramoto's samples in ISO's decoder. It is merely one of many straightforward 
implementations of decoding audio within ISO's decoder and does not involve the 
excise of inventive skill. 

Regarding Claim 9, in addition to the elements stated above regarding claim 8, 
ISO discloses any number of samples from 12 - 36 (page 36). 

Regarding Claim 10, in addition to the elements stated above regarding claim 8 
ISO discloses decoding MPEG audio (page 41 and title). 

Regarding Claim 20, in addition to the elements stated above regarding claim 9, 
wherein the steps of decoding are repeated for decoding a series of frames of encoded 
audio data in an MPEG format (i.e. the bit stream inputs a series of MPEG frames to be 
decoded; page 41 ). 

Claims 7, 12, 13 and 15 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Sousa (European Patent Application 0 564 089 A1 ) in view of 
Uramoto (European Patent Application 0 506 111 A2) and in further view of ISO 
Standard 11172-3. 
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Regarding Claim 7, in addition to the elements stated above regarding claim 1, 
the combination of de Sousa in view of Uramoto does not disclose the limitations of 
claim 7. 

ISO discloses wherein the input sequence of data elemerrts-is derived from 
MPEG encoded audio data (page 41 and title), and wherein the decoded audio signals 
comprise pulse code modulation samples (i.e. the audio data left and righ channel 
outputs; page 41). It would have been obvious to one of ordioary skilt in the art at the 
time of the invention to use the samples of the combination in ISO's decoder. It is 
merely one of many straightforward implementations of decoding audio within ISO's 
decoder and' does not involve the excise of inventive skill. 

Regarding Claim 12, in addition to the elements stated above regarding claim 
1 1 , ISO discloses the use of the inverse modified discrete cosine transform to decode 
audio data (pages 36 and 41 ). 

Regarding Claim 13, in addition to the elements stated above regarding claim 
1 1 , ISO discloses decoding MPEG audio (page 41 and title). 

Regarding Claim 15, in addition to the elements stated above regarding claim 
14, ISO discloses wherein the means for receiving an input sequence comprises a 
bitsgream unpacking and decoding circuit (page 41 ). 
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Conclusion 



THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth In 37- GFR 1.1 36(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the.advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this 
communication or earlier communications from the examiner should be directed to 
Andrew C. Flanders whose telephone number is (571) 272-7516. The examiner can 
normally be reached on M-F 8:30 - 5:00. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Vivian Chin can be reached on (571) 272-7848. The fax phone number for 
the organization where this application or proceeding is assigned is 703-872-9306. 



Application/Control Number: 09/486,582 



Page 12 



Art Unit: 2644 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status Information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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Cross-Roferencc te Relatod Applications and -Materials 

The following U.S. patent applications fSed concurrently with the present application and assigned to the 
assignee of the present application are related to the present application and each Is herety incorporated here- 
5 in as if set forth in its entirety: 'RATE LOOP PROCESSOR FOR PERCEPTUAL ENCODER/DECOOEa by 
J.O. Johnston: *A METHOD AND APPARATUS FOR CODING AUDIO SIGNALS BASED ON PERCEPTUAL 
MODEL/ by J.D. Johnston; and "AN ENTROPY CODER.' by J.D. Johnston and JA Reeds, 

Field of the Invention 



The present invention relates to processing of information signals, and more particularly, to the efficient 
encoding and decoding of mono phonic and stereophonic audio signals, including signals representative of 
voice and music information, for storage or transmission. 

15 Background of the Invention 

Consumer, industrial, studio and laboratory productsfor storing, processing and communicating high qual* 
ity audio signals are in great demand. For example, so-called compact diat ("CD)') and digital audio tape 
('OAT^ recordings for music have largely replaced the (ong-popular phoriograph record and cassette tape. 

20 Likewise, recently available digital audio tape ('DAP*) recordings promise to provide greater flexibility and high 
storage density for high quality audio signals. See. also. Tan and Vermeuleri, 'Digital audio tape for data stor- 
age'. tEEE Spectrum, pp. 34-38 (Oct. 1989), A demand is also arising for broadcast applications of digital tech- 
nology that offer CD-like quality. 

While these emerging digital techniques are capable of producing high quality signals, such performance 

2$ is often achieved only at the expense of consklerable data storage capacity or transmission bandwkJth. Ac- 
cordingly, much work has been done in an attempt to compress high quality audio signals for storage and trans- 
mission. 

Most of the prior work directed to compressing signals for transmission and storage has sought to reduce 
the redundancies that the source of the signals places on the signal. Thus, such techniques as ADPCM. 5ul>- 
30 band coding and transform coding described, e.g.. in N.S, Jayant and R Noil, 'Digital Coding of Waveforms,' 
Prentice-Hall. Inc. 1984. have sought to eliminate redundancies that otherwise would exist in the source sig- 
nals. 

In other approaches, the irrelevant information in source signals is sought to k>e eliminated using techno 
ques based on models of the human perceptual system. Such techniques are described, e.g., in E.F. Schroeder 

35 and JJ. Platte, ""MSC; Stereo Audio Coding with CD-Quality and 256 kBIT/SEC/ IEEE Trans, on Consumer 
Electronk:s, Vol. CE-33, No. 4, November 1987; and Johnston, Transform Coding of Audio Signals Using Noise 
Critoria, Vol. 6, No. Z IEEE J.S.CA. (Feb. 1988). 

Pen:6ptual coding, as described, e.g., tn the Johnston paper relates to a technique for lowering required 
bltrates (or reapportioning available bits) or total number of bits in representing audio signals. In this form of 

40 coding, a masking threshold for unwanted signals is Identified as a functton of frequency of the desired signal. 
Then, Inter alia, the coarseness of quantizing used to represent a signal component of the desired signal is 
selected such that the quantizing noise introduced by the coding does not rise above the noise threshold, 
though it may be quite near this threshold. The Introduced noise is therefore masked in the perception process. 
Wh9e tradidonal signal-to- noise ratios for such perceptually coded signals may be relatively low. the quality 

45 of these signals upon decoding, as perceived by a human listener, is nevertheless high. 

Brandenburg etal, U.S. Patent 5.040,21 7« issued August 13. 1 991. describes a system for efHclendy coding 
and decoding high quality audio signals using such perceptual considerations. In particular, using a measure 
of the 'ndse-ldce' or "tone-like* quality of the input signals, the emt>odfments described in the latter system 
provides a very ef f fdent coding for monophonic audio signals. 

so it Is, of course, important that the coding techm.que8 used to compress audio signals do not themselves 
introduce offensive components or artifacts. This is especially important when coding stereophonic audio in- 
formation where coded information corresponding to one stereo channel, when decoded for reproduction, can 
interfere or interact wit h coding information corresponding to the other stereo channel. Implementatton choices 
for coding two stereo channels include so-called 'dual mono' coders using two independent coders operating 

55 at f Ued bit rates. By contrast loint mono' coders use two monophonic coders but share one combined bit rate. 
i.e.. the bit rate for the two coders is constrained to be less than or equal to a fixed rate, but trade- offs can 
be made between the bit rates for individual coders. 'Joint stereo' coders are those that attempt to use inter- 
channel properties for the stereo pair for realizing addittonal coding gain. 
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It has been found that the independent coding of the two channels of a stereo pair, especially at low bit- 
rates, can lead to a number of undesirable psychoacoustic artifacts. Among them are those related to the lo- 
calization of coding noise that does not match the localization of the dynamically imaged signal. Thus the hu- 

5 man stereophonic perception process appears to add constraints to the encoding process If such mismatched 
tocalization is to be avoided. This finding is consistent with reports on binaural masking-tevel differences that 
appear to exist, at least for low frequencies, such that noise may be isolated spatlaOy. Such binaural masktng- 
level differences are considered to unmask a noise component that would be masked in a monophonic system. 
See, for example. B.C.J. Morre, "An Introduction to the Psychology of Hearing, Second EdiUon." especially 

10 chapter 5, Academic Press, Orlando. FU 1982. 

One technique for reducing psychoacoustic artifacts in the stereophonic context employs the IS0-W611- 
MPEG-Audio Psychoacoustic II (ISO] Model. In this model, a second limit of signaHo-noise ratio ("SNR") is 
applied to signal-to-noise ratios Inside the psychoacoustic model. However, such additional SNR constraints 
typically require the expenditure of additk)nal channel capacity or (in storage applications) the use of additional 

IS storage capacity, at low frequencies, while also degrading the monophonic performance of the coding. 

Summary of the Invention 

Limitations of the prior art are overcome and a technical advance is maftle in a method and apparatus for 

70 coding a stereo pair of high quality audto channels in accordance with aspects of the present inventton. Inter- 
channei redundancy and irrelevancy are exploited to achieve lower bit-rates while maintaining high quality re- 
production after decoding. While particularly appropriate to stereophonic coding and decoding, the advantages 
of the present invention may also be realized in conventional dual monophonic stereo coders. 

An ilhjstfaUve embodiment of the present invention employs a filter bank architecture using a Modified Ois- 

25 Crete Cosine Transform (MOOT). In order to code the full range of signals that may be presented to the system, 
the Biustratlve embodiment advantageously uses both UH (Left and Right) and M/S (Sum/Difference) coding, 
switched in both frequency and time in a signal dependent fashion. A new stereophonic noise masking model 
advantageously detects and avoids binaural artifacts in the coded stereophonic signal. Interchannel redundan- 
cy is exploited to provide enhanced compression for without degrading audk) quality. 

30 The time behavior of both Right and Left audio channels is advantageously accurately monitored and the 
results used to control the temporal resolution of the coding process. Thus, In one aspect, an illustrative env 
bodiment of the present invention, provides processing of input signals In terms of either a normal MDCT win- 
dow, or. when signal conditions indicate, shorter windows. Further, dynamic switching between RIGHT/LEFT 
or SUM/DIFFERENCE coding modes is provided both In time and frequency to control unwanted binaural noise 

35 tocalization. to prevent the need for overcoding of SUM/OlFFERENCE signals, and to maximize the global cod- 
ing gain. 

Atypical bitstream definition and rate control loop are described which provide useful fiexibQity in forming 
the coder output Interchannel Irrelevandes. are advantageously eliminated and stereophonic noise masking 
improved, thereby to achieve improved reproduced audio quality In jointly coded stereophonfe pairs. The rate 
40 control method used in an illustrative embodiment uses an interpolation between absolute thresholds and 
masking threshold for signals below the rate-limit of the coder, and a threshold elevation strategy under rate- 
limited conditions. 

In accordance with an overall coder/decoder system aspect of the present InvenUon, it proves advanta- 
geously to employ an Improved Huffman- like entropy coder/decoder to further reduce the channel bit rate r»- 

« quiromenls. or storage capacity for storage applications. The noiseless compression method illustratively used 
employs Huffman coding along with a frequency-partitioning scheme to efficiently code the frequency samples 
for L R. M and S, as may be dictated by the perceptual threshold 

The present invention provMes a mechanism for determining the scale factors to be used in quantizing 
the audio signal (i.e.. the MDCT coefficients output from the analysts filter bank) by using an approach different 

50 from the prior art. and whSe avoiding many of the restrictions and costs of prior quantizer/rate-loops. The audio 
signals quantized pursuant to the present invention introduce less noise and encode into fewer bits than the 
prior art. 

These results are obtained In an Hlustrative embodiment of the present invention whereby the utilized scale 
factor, is iterativety derived by interpolating between a scale factor derived from a calculated threshold of hear- 
55 ing at the frequency corresponding to the frequency of the respective spectral coefficient to be quantized and 
a scale factor derived from the absolute threshold of hearing at said frequency untH the quantized spectral coef- 
ficients can be encoded within permissible limits. 
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Brief Description of the Drawings 

FIG. 1 presents an tllustrative prior art audio communication/storage system of a type in which aspects of 
5 the present invention find application, and provides improvement and extension. 

FIG. 2 presents an itlustrath/e perceptual audio coder (PAC) in which the advances and teachings of the 
present invention find application, and provide Improvement and extension. 

FIG. 3 shows a representation of a useful masking level difference factor used in threshold calcinations. 
FIG. 4 presents an illustrative analysis filter bank according to an aspect of the present invention. 
' 10 FIG. S(a) through 5(e) Qtustrate the operatbn of various window functions. 
FIG. 6 Is a flow chart Olustrating window switching functionality. 

FIG. 7 is a block/flow diagram illustrating the overall processing of input signals to derive the output bit* 
stream. 

FIG. 8 illustrates certain threshold variations. 
15 FIG. 9 is a flowchart representation of certain bit allocation functbnality. 
FIG. 10 shows bitstraam organizatton. 

FIGs 11a through 11c illustrate certain Huffman coding operations, 
FIG. 12 shows operations at a decoder that are complementary to those for an encoder. 
FIG. 13 is a flowchart illustrating certain quantization operattons in accordance with an aspect of the pres- 
to ent invention. 

FIG. 14(a) through 14(g) are Olustrative windows for use with the filter bank of RG. 4. 
Detailed Description 
25 1. Overview 

To simplify the present disclosure, the following patents, patent applications and publications are hereby 
incorporated by reference in the present disclosure as if fully set forth herein: tj.S. Patent 5,040.217, issued 
August 13, 1991 byK. Brandenburg etal, United States Patent Application Serial ^k>. 07/292,598, entitled ^r* 

30 ceptual Coding of Audio Signals, filed December 30.1 988; J, D. Johnston, Transform Coding of Audio Signals 
Using Porceptual Noise Cn'teria, IEEE Journal on Selected Areas in Communicattons, Vol. 6, No. 2 (Feb. 1988): 
International Patent Application (PCT) WO 88/01 811 , f Hed March 10, 1 988; United States Patent Application 
Serial No. 07/491 ,373, entitled Hybrid Perceptual Coding, filed March 9. 1990. Brandenburg etal, Aspec: Adap- 
tive Spectral Entropy Coding of High Quality Music Signals, AES 90th Convention (1991): Johnston, J., Esth 

35 mation of Perceptual Entropy Using Noise Masking Criterfa, ICASSP. (1 988); J. 0. Johnston, Peiceptuai Trans- 
form Codingof Wideband Stareo Signals, ICASSP (1989); E.F. Schroeder and J J. Platte, "MSC: Stereo Audio 
Coding with CD-Quality and 256 kSIT/SEC/ IEEE Trans, on Consumer Electronics, Vol. CE-33. No. 4, No- 
vember 1987; and Johnston, Transform Coding of Audio Signals Using Noise Criteria, Vol, 6, No. 2, IEEE 
J.S.CJV. (Feb. 1988). 

40 For darity of explanation, the Qlustrative embodiment of the present inventon Is presented as comprising 
individual functional blocks (including functional blocks labeled as 'processors*). The functions these blocks 
represent may be provided through the use of either shared or dedicated hardware, including, but not limited 
to, hardware capable of executing software. (Use of the term "processor* should not be construed to refer ex* 
duslv^y to hardware capable of executing softw^.) Illustrative embodiments may comprise digital signal 

45 processor (DSP) hardware, such as the AT&T DSP 16 or DSP32C, arid software performing the operations 
discussed below. Very large scale integration (VLSI) hardware embodiments of the present invention, as well 
as hybrid DSP/VLSI embodiments, may also be provided. 

FIG. 1 is an overall block diagram of a system useful for incorporating an iOustrative embodiment of the 
present invention. At the level shown, the system of FIG. 1 aiustrates systems known in the prior art, but mod- 

50 ificatlons, and extensions described herein will make dear the contributions of the present invention. In FIG. 
1, an analog audio stgnal 101 is fed into a preprocessor 102 where it Is sampled (typically at 48 KHz) and con- 
verted into a digital pulse code modulation ('PCM*) signal 103 (typically 16 bits) in standard fashion. The PCM 
stgnal 103 is fed into a perceptual audio coder 104 (*PAC*) which compresses the PCM stgnal and outputs the 
compressed PAC signal to a communications channel/storage medium 105. From the communications chan- 

55 net/storage medium the compressed PAC signal is fed into a perceptual audio decoder 107 whifzh decompress- 
es the compressed PAC signal and outputs a PCM signal 108 which is representative of the compressed PAC 
stgnal. From the perceptual audio decoder, the PCM signal 108 is fed into a post-processor 109 which creates 
an analog representation of the PCM signal 108. 

An illustrative embodiment of the perceptual audio coder 104 is shown in block diagram form In FIG. 2. As 
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in the case oflhe system illustrated In FIG. 1 , the system of FIG. 2, wHhout more, may equally describe certain 
prior art systems, e.g.. the system disclosed in the Brandenburg* et at U.S. Patent 6.040.217. However, with 
the extensions and modifications described herein, important new results are obtained. The perceptual audio 

5 coder of FIG. 2 may advantageously be viewed as comprising an analysis f Oter bank 202. a perceptual model 
processor 204. a quantczer/rate-loop processor 206 and an entropy coder 208. 

The filter bank 202 in FIG. 2 advantageously transforms an input audio signal in time/frequency In such 
manner as to provide both some measure of signal processing gain (i.e. redundancy extraction) and a mapping 
of the filter bank inputs in a way that is meaningful in light of the human perceptual system. Advantageously, 

10 the well-known Modified Discrete Cosine Transform (MOCT) described. e.g.. in J.P. Princen and A.8. Bradley. 
"Analysis/Synthesis Fitter Bank Design Based on Time Domain Aliasing Cancellation." IEEE Trans. ASSP. Vol. 
34. No. 5. October, 1986, may be adapted to perform such transforming of the input signals. 

Features of the MDCT that make it useful in the present context include its critical sampling characteristic. 
I.e. for every n samples into the filter bank, n samples are obtained from the filter bank. Additionally, the MDCT 

15 typically provides half- overlap, Le. the transform length is exactly twice the length of the number of samples, 
n. shifted into the filterbank. The half-overlap provides a good method of dealing with the control of noise in- 
jected independently Into each filter tap as well as providing a good analysis window frequency response. In 
addition, in the absence of quantization, the MDCT provides exact reconstruction of the Input samples, subject 
only to a delay of an integral number of samples. 

20 One aspect in which the MOCT is advantageously modified for use in connection with a highly efficient 
stereophonic audio coder is the provision of the ability to switch the length of the analysis wir\dow for signal 
sections which have strongly non-stationary components in such a fashion that it retains the criticaUy sampled 
and exact reconstruction properties. The incorporated U.S. patent applicatton by Ferreira and Johnston, enti- 
tled -A METHOD AND APPARATUS FOR THE PERCEPTUAL COOING OF AUDIO SIGNALS.' (referred to 

25 hereinafter as the "filter bank application") filed of even date with this application, describes a filter bank ap- 
propriate for performing the functions of element 202 in FIG. 2. 

The perceptual model processor 204 shown in RG. 2 calculates an estimate of the perceptual importance, 
noise masking properties, or just noticeable noise floor of the various signal components in the analysis bank. 
Signals representative of these quantities are then provided to other system elennents to provide improved con- 

30 trol of the filtering operations and organizing of the data to be sent to the channel or storage medium. Rather 
than using the critical band by critical band analysis described In J.D. Johnston, Tran^orm Coding of Audb 
Signals Using Perceptual Noise Criteria," IEEE J. on Selected Areas in Communicattons. Feb, 1988, an Ulus- 
tratlve embodiment of the present inventksn advantageously uses finer frequency resolution In the calculation 
of thresholds. Thus instead of using an overall tonality metric as in the last*cited Johnston paper, a tonality 

35 method based on that mentioned In K. Brandenburg and J.O. Johnston. 'Second Generation Perceptual Audto 
Coding: The Hybrid Coder/ AES SSth Convention, 1990provides a tonality estimate that varies over frequency, 
thus providing a better fit for complex signals. 

The psychoacoustic analysis performed in the perceptual model processor 204 provMes a noise threshold 
for the L (Left), R (Right), M (Sum) and S (Difference) channels, as may be appropriate, for both the normal 

40 MOCT window and the shorter windows. Use of the shorter v«nndows is advantageously controlled entirely by 
the psychoacoustic modal processor. 

In operation, an OlustratWe embodiment of the perceptual model processor 204 evaluates thresholds for 
the left and right channels, denoted THR( and THfi,. The two thresholds are then compared in each of the 
iltustraUve 35 coder frequency partitions (56 partitions in the case of an active window-switched block). In each 

45 partition where the two thresholds vary between left and right by less than some amount, typically 2dB, the 
coder is switched into M/S mode. That is. the left signal for that band of frequencies is replaced by M - (L^R)/2, 
and the right signal is replaced by S » (L-R)/2. The actual anu)unt of difference that triggers the last-mentioned 
substitution will vary with bitrate constraints and other system parameters. 

The same threshold calculation used for L and R thresholds is also used for M and S thresholds, with the 

so threshold calculated on the actual M and S signals. First the basic thresholds, denoted BTHR„, and MLD. are 
calculated. Then, the following steps are used to calculate the stereo masking coritribution of the M and S sig- 
nals. 

1 , An additional factor ts calculated for each of the M and S thresholds. This factor, called MLD„. and MID^ 
is calculated by multiplying the spread signal energy, (as derived, e.g.. In J.O. Johnston, Transform Coding 
55 of Audio Signals Using Perceptual Noise Criteria.* IEEE J. on Selected Areas in Communications. Feb. 
1 g88: K. Brandenburg and J.D. Johnston. 'Second Generation Perceptual Audio Coding: The Hybrid Cod- 
er,* AES 89th Convention, 1990; and Brandenburg, et al U.S. Patent S.040,217) by a masking level differ- 
ence factor shown illustnitlvely in FIG. 3. This calculates a second level of detectabSity of noise across 
frequency in the M and S channels, based on the masking level differences shown in various sources. 
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2. The actual threshold for M (THR J is calculated as THR„ = max{BTHR,„. niln(BTHR„ MLOJ) and the 
threshold m = majc(BTHR„^.fnin(BTHR,. MLD,)) and the threshold for S is calculated as THR, =max(BTHR„ 
min(BTHR,„, MLDJ). 

5 In effect the MLD signal substitutes for the BTHR signal in cases where there is a chance of stereo un- 

masking. It is not necessary to consider the issue of M and S threshold depression due to unequal L and R 
thresholds, because of the fact that L and R thresholds are known to t>e equal. 

The quantizer and rate control processor 206 used In the illustrative coder of FIG. 2 takes the outputs from 
the analysis bank and the perceptual model, and allocates bits, noise, and controls other system parameters 

10 so as to meet the required bit rate for the given appdcation. In some example coders this may consist of nothing 
more than quantization so that the just noticeable difference of the perceptual model is never exceeded, with 
no (explicit) attention to bit rate; in some coders this may be a complex set of iteration loops that adjusts dis- 
tortion and bitrate.in order to achieve a balance between bit rate and coding noise. A particularly useful quan- 
tizer and rate control processor is described in incorporated U.S. patent application by J.O, Johnstoa entitled 

15 -RATE LOOP PROCESSOR FOR PERCEPTUAL ENCODER/OECODER." (hereinafter referred to as the "rate 
loop application*) filed of even date with the present application. Also desirably performed by the rate loop 
processor 206. and described in the rate loop application, is the function of receiving information from the quan- 
tized analyzed signal and any requisite side inforntatlon, inserting synchronization and framing information. 
Again, these same functions are broadly described in the incorporated "Brandenburg, et aL U.S. patent 
20 5,040,^17. 

Entropy coder 208 is used to achieve a further noiseless compression in cooperation with the rate control 
processor 206. In particular, entropy coder 208. in accordance with another aspect of the present invention, 
advantageously receives inputs including a quantized audio signal output from quantizer/rate-loop 206, per- 
forms a lossless encoding on the quantized audio signal, and outputs a compressed audio signal to the com- 
- 25 municatkjns channei/storage medium 106. 

Illustrative entropy coder 208 advantageously comprises a novel variation of the minimunvredundancy 
Huffman coding technique to encode each quantized audio signal. The Huffman codes are described, e.g.. in 
OJl Huffrnan, "A Method for the Construction of Minimum Redundancy Codes*. P/oc. IRE, 40:1098- 1101 
(ig52) and T.M. Cover and J.A. Thomas^.ua Elements of Information Theory, pp. 92-101 (1991). The useful 
30 adaptations of the Huffrnan codes advantageously used in the context of the coder of FIG. 2 are described in 
more detail in the incorporated U.S. patent application by J.O. Johnston and J. Reeds (hereinafter the 'entropy 
coder application*) fOed of even date with the present application and assigned to the assignee of this appli- 
cation. Those skilled in the data comnminications arts wSI readily perceive how to implement alternative env 
bodiments of entropy coder 208 using other noiseless data compression techniques, Including the well- known 
35 Lempel-Ziv compression methods. 

The use of each of the elements shown In FIG. 2 wOt be described In greater detail in the context of the 
overall system functionality; detaSs of operation will be provided for the perceptual model processor 204. 

2.1. The Analysis Filter Bank 

40 

The analysis fOter bank 202 of the perceptual audio coder 104 receives as input pulse code modulated 
CPCM*) digital audio signals (typically 16-bit signals sampled at 48KHz), and outputs a representation of the 
input signal which identifies the individual frequency components of the input signal. Specif icaBy, an output of 
the analysis filter bank 202 comprises a Modified Discrete Cosine Transform (*M OCT") of the input signal. See, 

45 J. Princen et al. *Sub-band Transform Coding Using Rlter Bank Designs Bas^ on Time Domain Aliasing Can- 
cellation," jEEE ICASSP, pp. 2161-2164 (1987). 

An Qlustrative analysis filter bank 202 according to one aspect of the present invention is presented in FIG. 
4. Analy^s filter bank 202 comprises an input signal buffer 302. a window multiplier 304, a window memory 
306, an FFT processor 308. an MOOT processor 310. a concatenator 311, a delay memory 312 and a data 

so selector 132. 

The analysis filter bank 202 operates on frames. A frame is conveniently chosen as the 2N PCM Input audio 
signal samples held by input signal buffer 302. As stated above, each PCM input audio s^nal sample is rep- 
resented by M bits. Illustratively, N = 512 and M = 16. 

Input signal buffer 302 comprises two sections: a f iret section comprising N samples in buffer locattons 1 
55 to N, and a second section comprising N samples in buffer locatkins N>1 to 2N, Each frame to be coded by 
the perceptual audio coder 104 is defined by shifting N consecutive samples of the input audio signal into the 
input signal buffer 302. Older samples are located at higher buffer locations than newer samples. 

Assuming that at a given time, the input signal buffer 302 contains a frame of 2N audio s^nal samples, 
the succeeding frame is obtained by (1) shifting the N audio signal samples In buffer locations 1 to N into buffer 
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locations N-t-1 to 2N. respectively, (the previous audio signal samples in locations N^'l to 2N may be either 
overwritten or deleted), and (2) by shtf Ung into the input signal buffer 302. at buffer locations 1 to N new 
audio signal samples from preprocessor 1 02. Therefore, it can be seen that consecutive frames contain N sanv 

5 pies in common: the first of the consecutive frames having the common samples in buffer locations 1 to N. 
and the second of the consecutive frames having the common samples in buffer locations N^l to 2N. Analyse 
filter bank 202 is a critically sampled system (i.e., for every N audio signal samples received by the input signal 
buffer 302. the analysis filter bank 202 outputs a vector of N scalers to the quantizer/rate-loop 206). 

Each frame of the input audio signal is provided to the window multiplier 304 by the input signal buffer 

f 0 302 so that the window multiplier 304 may apply seven distinct data windows to the frame. 

Each data window is a vector of scalers called 'coefficients". While iall seven of the data windows have 2N 
coefficients (i.e., the same number as there are audio signal samples in the frame), four of the seven only 
have N/2 non-zero coefficients (l.e., one-fourth the number of audio signal samples in the frame). As Is dis- 
cussed Mow, the data window coefficfents may be advantageously chosen to reduce the perceptual entropy 

16 of the output of the MOCT processor 310. 

The information for the data window coefficients is stored In the window memory 306. The window memory 
306 may illustratively comprise a random access memory ('RAW), read only memory CROM"), or other mag- 
netic or optical media. Drawings of seven illustrative data windows, as applied by window multiplier 304, are 
presented in FIG. 4. Typical vectors of coefficients for each of the seven data windows presented in FIG. 4 

20 are presented in Appendbc A. As may t>e seen in both FIG. 4 and In Appendix A. some of the data window coef* 
ficients may be equal to zero. 

Keeping in mind that the data window Is a vector of 2N scalers and that the audk> signal frame is also a 
vector of 2N scalers, the data window coefficients are applied to the audio signal franrve scalers through point- 
to-point multiplication (I.e.. the first audio signal frame scaler is multiplied by the first data window coefficient. 

25 the second audio signal frame scaler is multiplied by the second data window coefficient, etc.)< Window mul- 
tiplier 304 may therefore comprise seven microprocessors operating in parallel, each performing 2N multiple 
cations in order to apply one of the seven data window to the audio signal franrte held by the input signal buffer 
302. The output of the window multiplier 304 is seven vectors of 2N scalers to be referred to as 'windowed 
frame vectors'. 

30 The seven windowed frame vectors are provided by window multiplier 304 to FFT processor 308. The FFT 
processor 308 performs an odd-frequency FFT on each of the seven windowed frame vectors. The odd*fre- 
quency FFT is an Discrete Fourier Transform evaluated at frequencies: 

Wh 

2N 

w where k = 1 , 3. 5.- •.2N. and fn equals one half the sampling rate. The illustrative FFT processor 308 may conr>- 
prise seven conventional dedmalion-ln-time FFT processors operating in parallel, each operating on a different 
windowed frame vector. An output of the FFT processor 308 is seven vectors of 2N complex elements, to be 
referred to collectively as "FFT vectors'. 

FFT processor 308 provides the seven FFT vectors to Mh the perceptual model processor 204 and the 

40 MDCT processor 310. The perceptual model processor 204' uses the FFT vectors to direct the operation of 
the data selector 314 and the quantizer/rate-loop processor 206. Detals regarding the operatbn of data se- 
lector 314 and perceptual model processor 204 are presented below, 

MDCT processor 310 performs an MDCT based on the real components of each of the seven FFT vectors 
received from FFT processor 306. .P MDCT processor 310 may comprise seven microprocessors operating in 

4S paratlel. Each such miaoprocessor determines one of the seven 'MDCT vectors* of N rea^ scalers based on 
one of the seven respective FFT vectors. For each FFT vector. F(k), t he resulting MDCT vector, )((k), is formed 
as follows: 

X(k) = Re(F{k))cosll5®^i^l^l 1 SkSN. 

50 The procedure need run k only to N. not 2N. because of redundancy in the result To wit. for N<k^N: 

X(k) = .X(2N-k). 

MDCT processor 310 provides the seven MDCT vectors to concatenator 311 and delay memory 312. 

As discussed above with reference to window mulUplier 304, four of the seven data windows have N/2 non- 
zero coefficients (see Figure 4c*f), This means that four of the windowed frame vectors contain only N/2 non- . 
55 zero values. Therefore, the norvzero values of these four vectors may be concatenated into a single vector of 
length 2N by concatenator 311 upon output from MOCT processor 310. The resulting concatenation of these 
vectors is handled as a single vector for subsequent purposes. Thus, delay memory 312 Is presented with four 
MOCT vectors, rather than seven. 

Delay memory 312 receives the four MDCT vectors from MDCT processor 314 and concatenator 311 for 
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the purpose of providing temporary storage. Delay memory 312 provides a delay of one audio signal frame 
(as defined by input signal buffer 302) on the flow of the four MOCT vectors through the filter bank 202. The 
delay is provided by (/) storing the two most recent consecutive sets of MOCT vectors representing consecutive 
audio signal frames and {Si presenting as input to data selector 314 the older of the consecutive sets of vectors. 
Delay memory 312 may comprise random access merrusry (RAM) of see: 

Mx2>c4>cN 

where 2 is the number of consecutive sets of vectors, 4 is the number of vectors In a set N is the number of 
elements in an MDCT vector, and M is the number of bits used to represent an MDCT vector element 

Data selector 314 selects one of the four MOCT vectors provided by delay memory 31 2 to be output from 
the filter bank 202 to quantizer^rate-loop 206. As mentioned above, the perceptual model processor 204 directs 
the operation of daU selector 314 based on the FFT vectors provWed by the FFT processor 308. Due to the 
operation of delay memory 312, the seven FFT vectors provided to the perceptual nrwdel processor 204 and 
the four MOCT vectors concurrently provided to data selector 314 are not based on the same audio inputframe, 
but rather on two consecuth^e Input signal firanrtes - the MOCT vectors based on the earlier of the frames, and 
the FFT vectors based on the later of the frames. Thus, the selection of a specific MOCT vector Is based on 
information contained in the next successive audio signal frame. The criteria according to which the perceptual 
model processor 204 directs the selection of an MOCT vector is described in Section 2.2. iselow. 

For purposes of an Blustrative stereo embodiment, the above analysis f ilterb^nk 202 is provided for each 
of the left and right channels. 

2.2. The Perceptual Model Processor 

A perceptual coder achieves success in reducing the nunnber of bits required to accurately represent high 
quality audb signals, in part, by introducing noise associated with quantization of information bearing signals, 
such as the MDCT Information from the filter bank 202. The goal is, of course, to introduce this noise in an 
imperceptibie or benign way. This noise shaping is primarily a frequency analysis Instrument, so it is convenient 
to convert a signal into a spectral representation (e.g., the MDCT vectors provided by filter bank 202). compute 
the shape and anwunt of the noise that will be masked by these signals and injecting it by quantizing the spec- 
tral values. These and other basic operations are represented in the structure of the perceptual coder shown 
in FIG. 2, 

The perceptual model processor 204 of the perceptual audio coder 1 04 illustratively receives its Input from 
the analysis filter bank 202 which operates on successive fmmes. The perceptual model processor inputs then 
typically comprise seven Fast Fourier Transform (FFT) vectors from the analysis f ater bank 202. These are 
the outputs pf the FFT processor 308 in the form of seven vectors of 2N complex elements, each corresponding 
to one of the windowed firame vectors. 

In order to mask the quantization noise by the signal, one must consider the spectra) contents of the signal 
and the duratksn of a particular spectral pattern of the signal. These two aspects are related to masking in the 
frequency domain where signal and noise are approximately steady state ^Iven the integration period of the 
hearing system- and also with nnasklng in the time domain where signal and noise are subjected to different 
cochlear filters. The shape and length of these filters are frequency dependent 

Masking in the frequency domain is described by the concept of simultaneous masking. Masking In the 
time domain is characterized by the concept of premasklng and postmasking. These concepts are extensively 
explained in the literature; see. for example, E, Zwicker and H. Fasti. 'Psychoacoustics, Facta^and Modets," 
Springer^Verlag, 1990. To make these concepts useful to perceptual codirig, they are embodied in different 
ways. 

Simultaneous masking is evaluated by using perceptual noise shaping models. Gh^en the spectral contents 
of the signal and its description In terms of noise-iike or tone-like behavior, these models produce an hypo- 
thetical masking threshold that rules the quantization level of each spectral -component This noise shaping 
represents the maximum amount of noise that may be introduced In the original signal without causing any 
perceptible difference. A measure called the PBRCEPTUAL ENTROPY {PE) uses this hypothetical masking 
threshokl to estimate the theoretical lower bound of the bttrate for fransparent encoding. J. 0. Johnston, Es- 
iimation of Perceptual Entropy Using Noise Masking Criten'a,' ICASSP. 1989. 

Premasking characterizes the (in)audibility of a noise that starts some time before the masker signal which 
is louder than the noise. The noise amplitude must be more attenuated as the delay increases. This attenuation 
level ts also frequency dependent If the noise is the quantization noise attenuated by the first half of the syn- 
thesis window, experimental evidence indicates the maximum acceptable delay to be about 1 millisecond. 

This problem is very sensitive and can conflict directly with achieving a good coding gain. Assuming sta- 
tionary conditions - which is a false premiss- The coding gain is bigger for larger transforms, but, the quanti- 
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zation error spreads till the beginning of the reconstructed time segment So. if 3 transform length of 1024 
points is used, vnth a digital sign^ sampled at a rate of 4d000Hz. the noise will appear at most 21 miHtseconds 
before the signal. This scenario is particularly cn'tical when the signal takes the form of a sharp b^nsient in 

5 the time domain commonly known as an "attack". In this case the quantization noise ts audible before the at- 
tack. The effect is known as pre-echo. 

Thus, a fixed length f Qter l>ank Is a not a good perceptual solution nor a signal processing solution for non- 
stattonary regions of the signal. It wQl be shown later that a possible way to circumvent this problem is to im- 
prove the temporal resolution of the coder by reducing the analysis/synthesis window length. This is imple- 

10 mented as a window switching mechanism when conditions of attack are detected. In this way. the coding gain 
achieved by using a long analysis/synthesis window wflt be affected only when such detection occurs with a 
consequent need to switch to a shorter analysis/synthesis window. 

Postmasking characterizes the (in)audibility of a noise when it remains after the cessation of a stronger 
masker signal. In this case the acceptable delays are in the order of 20 milliseconds. Given that the bigger 

f5 transformed time segment lasts 21 milliseconds (1024 samples), no special care Is needed to handle this sit- 
uation. 

WINDOW SWITCHING 

20 The PERCEPTUAL ENtROPY(PE) measure of a particular transform segment gives the theoretical lower 
bound of bits/sample to code that segment transparently. I>ue to its memory properties, which are related to 
premasking protection, this measure shows a significant increase of the PE value to its previous value -related 
with the prevk)us segment- when some situations of strong non-stationarity of the signal (e.g. an attack) are 
presented. This important property Is used to activate the window switching mechanism in order to reduce pre- 

2$ echo. This window switching mechanism Is not a new strategy, having been used. e.g.. in the ASPEC coder, 
described in the ISO/MPEG Audio Coding Report. 1 990. but the dectsbn technique behind it is new using the 
PE Information to accurately localize the non-stationanty and define t^ right moment to operate the switch. 

Two basic window lengths: 1024 samples and 256 samples are used. The former corresponds to a seg- 
ment duration of about 21 mOliseconds and the latter to a segment duration of about 5 millisecQnds. Short win- 

30 dows are associated in sets of 4 to represent as much spectral data as a large window (but they represent a 
"d'lf ferenr number of temporal samples). In order to make the transition from large to short windows and vice- 
versa it proves convenient to use two more types of windows. ASTART window makes the transition from large 
(regular) to short windows and a STOP window makes the opposite transition, as shown in FIG. 5b. See the 
above-cited Princen reference for useful information on this subject Both windows are 1024 samples wide. 

35 They are useful to keep the system critically sampled and also to guarantee the time aliasing cancellation proc- 
ess in the transition region. 

In order to exploit interchannel redundancy and irrelevancy, the same type of window is used for RIGHT 
and LEFT channels in each segment 

The stationarity behavior of the signal is monitored at two levels. First by targe regular windows, then if 

40 necessary, by short windows. Accordingly, the PE ci large (regular) window is calculated for every segment 
while the PE of short windows are calculated only when needed. However, the tonality information for both 
types Is updated for every segment In order to follow the continuous variation of the signal. 

Unless stated otherwise, a segment involves 1024 samples which is the length of a large regular window. 

^ The diagram of FIG. Sa represents all the monitoring possibaities when the segment from the point j till 

the point ^ Is being analyzed. Related to diagram is the flowchart of FIG. 6 describes the monitoring sequence 

and dedston technique. We need to keep in buffer three halves of a segment In order to be able to insert a 
START window prior to a sequence of short windows when necessary. FIGs. 5a-e explicitly ccnsklers the 50% 

so overlap between successive segments. 

The process begins by analysing a "new* segment with 512 new temporal samples (the remaining 512 * 
samples belong to the previous segment). The PE of this new segment and the differential PE to the previous 
segment are calculated. If the latter value reaches a predefined threshokl. then the existence of a non-statio* 
narity inside the current segment is declared and details are obtained by processing four short windows with 

55 positions as represented tn FIG. 5a. The PE value of each short window is calculated resulting in the ordered 
sequence: PE1. PE2. PE3 and PE4. From these values, the exact beginning of the strong non-stationarity of 
the signal is deduced. Only five tocations are possible. They are identified in FIG. 4a as LI, 12, t^. L4 and L5. 

As it will become evident, if the non-stationan'ty had occurred somewhere from the point ^ tai the point 
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that situation would have been detected In the previous segment It follows that the P£1 value does not contain 
relevant information about the stationanty of the current segment. The average PE of the short windows is 
compared with the PE of the large window of the same segment Asmaller PE reveals a more efficient coding 

5 situation. Thus if the form0r value is not smaller than the latter, then we assume that we are facing a degenerate 
situation and the window switching process is aborted. 

It has been observed that for short windows the information about stationarity lies more on Its PE value 
than on the differential to the PE value of the precedent window. Accordingly, the first window that has a PE 
value larger than a predefined threshold is detected. PE2 Is identified with location LI. PE3 with L2 and PE4 

10 with location L3. In either case, a START window is placed before the current segment that wfll be coded with 
short windows. A STOP window is needed to complete the process. There are. however, two possibOities. If 
the identified location where the strong non- stationarity of the signal begins is L1 or L2 then, this Is well Inside 
the short window sequence, no coding artifacts result and the coding sequence is depicted in R6. 5b. If the 
location if L4, then, in the vwrst sihjation, the non-stationartty may begin very close to the right edge of the 

15 last short window. Previous results have consistenfly shown that placing a STOP window -in coding conditions- 
In these circumstances degrades signiftcantty the reconstruction of the signal in this switching point For this 
reason, another set of four short windows is frfaced before a STOP window. The resulting coding sequence Is 
represented in FIG. 5e. 

If none of the short PEs Is above the threshold, the remaining possibiiitfes are 14 or L5. tn this case, the 

20 problem lies ahead of the scope of the short window sequence and the first segment In the buffer may t>e 
immediately coded using a regular large window. 

To identify the correct location, another short window must be processed. It is represented In FIG. 5a by 
a dotted curve and its PE value, PE1nM. w also computed. As It Is easily recognized this short window already 
belongs to the next segment If PEVi is above the threshold, then, the location Is 14 and. as depicted in FIG. 

25 5c, a START window may be followed by a STOP window. In this case the spread of the quantization noise 
will be limited to the length of a short window, and a better coding gain Is achieved. In the rare situation of the 
location being L5. then the coding is done according to the sequence of FIG. 5d. The way to prove that in this 
case that is right solution i^ by confirming that PE2„*i wfll be above the threshold.' PE2ft4^i is the PE of the 
short window (not represented in FIG. 5) immedtatety following the window Identified with PE Vi. 

30 As mentioned before for each segment, RIGHT and LEFT channels use the same type of analysis/syrv 
thesis window. This means that a switch is done for both channels when at least one channel requires it 

It has been observed that for tow bttrate applications the solution of FIG. 5c. although representing a good 
local psychoacoustic solution, demands an unreasonably large number of bits that may adversely affect the 
coding quality of subsequent segments. For this reason, that coding solution may eventually be inhibited. 

35 It is also evident that the details of the reconstructed signal when short windows are used are closer to 
the original signal than when only regular large window are used. This is so because the attack fs basically a 
wide bandwidth signal and may only be considered stationary for very short periods of time. Since short win- 
dows have a greater temporal resolution than large windows, they are able to follow and reproduce with more 
fidelity the varying pattern of the spectrum. In other words, this is the difference between a more precise local 

40 (in time) quantization of the signal and a global (in frequency) quantization of the signal. 

The final masking threshold of the stereophonic coder is calculated using a combination of monophonic 
and stereophonic thresholds. While the monophonic threshold is computed Independently for each channel, 
the stereophonic one considers both channels. 

The independent masking threshold for the RIGHT or the LEFT channel is computed using a pisychoa- 

45 coustic model that Includes an expression for tone nriasking noise and noise masking tone. The latter is used 
as a conservative approxin\ation for a noise masking noise expression. The monophonic threshold is calculated 
using the same procedure as previous work. In particular, a tonality measure considers the evolution of the 
power and the phase of each frequency coefficient across the last three segments to Identify the sfgnai as 
being more tone-like or noise-like. Accordingly, each psychoacoustic expression is more or less weighted than 

50 the other. These expressions found in the literature were updated for better performance. They are defined 
as: 

TMN« = 19.5 ♦ bark^ 

mr^B = 6,56 - t>ark^ 

where bark is the frequency in Bark scale. This scale is related to what we may call the cochlear iiltefs 
or critical bands which, in turn, are Identified with constant length segments of the basilar membrane. The final 
threshold is adjusted to consider absolute thresholds of masking and also to consider a partial premasking 
protection. 
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A brief descripUon of the complete monophonic threshold calculation follows. Some terminology must t>e 
introduced in order to simplify the description of the operations Involved. 

The spectrum of each segment is organized in three different ways, each one following a different purpose. 

5 1 . First, it may be organized in parb'tions. Each partition has associated one single Bark value. These par- 

titions provide a resolution of approximately either one MOCT line or 1/3 of a critical band, whichever is 
wider. At low frequencies a single line of the MOCT will constitute a coder partition. At high frequencies, 
many tines will be combined into one coder partition. In this case the Bark value associated is the median 
Bark point of the partition. This partitioning of the spectrum is necessary to insure an acceptable resolution 

10 for the spreading functton. As will be shown later, this function represents the masking influence among 
netght>oring critical bands. 

2. Secondly, the spectrum may be organized in bands. Bands are defined by a parameter f3e. Each band 
groups a number of spectral lines that ara associated with a single scale factor that results from the final 
masking threshold vector. 

15 3. Finally, the spectrum may also be organized In sections. It will be shown later that sections involve an 
integer ruimber of bands and represent a region of the spectrum coded with the same Huf fhian code book. 
Three indices for data values are used. These are: 
0) indicates that the calculation is indexed by frequency in the MOCT line domain, 
b indicates that the calculation is indexed in the threshold calcul^ion partition domain. In the case 
20 where we do a convolution or sum in that domain, bb will be used as the summation variable, 
n indicates that the calculation is indexed in the coder band domain. 
Addittonaily some symbols are also used: 

1 . The index of the calculation partitk>n. b. 

2. The lowest frequency line in the partition, adowb. 
2$ 3. The highest frequency line in the partition, uhlghb. 

4. The median bark value of the partition, bval^. 

5. The value for tone masking noise (in dB) for the partition, TMN^. 

6. The value for noise masking tone (in dB) for the partition. NMT». 

Several points in the following description refer to the "spreading function*. It is calculated by the followng 
30 method: 

tmpx = 1.05(J.i), 

Where / is the bark value of the signal being spread.;' the bark value of the band being spread into, and tmpx 
is a temporary variable. 

X = 6 minimum((tmpx - .5)3 * 2(tmpx - .5).0) 
35 Where x is a temporary variable, and minimum(a,b) is a function returning the more negative of a or b. 

tmpy = 15.811389 ♦ 7.5(tmpx ♦ ,474) - 17.5(1. ♦ (tmpx ♦ .474)»>« 
where tmpy is another temporary variable, 
if (tmpy < • 100) then {8prdngf{i j) = 0) else (sprdngf(i.J) = 10^^^^. 

^ Steps In Threshold Calculation 

The following steps are the necessary steps for calculation the SMRn used in the coder. 

1 . Concatenate 512 new samples of the input signal to form another 1024 samples segment Please refer 

to FIG. 5a. 

^ 2. Calculate the complex spectrum of the input signal using the O-FFT as described in 2.0 and using a 
sine window. 

3. Calculate a predicted r and ^ 

The polar representation of the transform is calculated. r« and ^ represent the magnitude and phase 
components of a spectral line of the transformed segment 
^ A predicted magnitude, and phase, are calculated from the preceding two threshold calculation 

blocks* r and ^: 

f. = 2r,(t.1).r^t-2) 
♦. = 2<L(t-1)-4«0-2) 

where t represents the current block number, t-1 indexes the prevtous block's data, and t-2 Indexes the data 
^ from the threshold calculation block before that 

4. Calculate the unpredictability measure c» 
c„. the unpredictabOity measure, is: 
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(( rfl,cos4>o*rocos^<a)^ + (ro>sin^a>-f<asin^<a) )' 
rtt+abs(f«o) 



5. Calculate the energy and unpredictabOity in the threshold calculation partitions. 
The energy in each partition, e^, is: 

IS and the weighted unpredictability, c^. is: 

cp= £ tic 

20 



6. Convolve the partitioned energy and unpredictabflity with the spreading function, 

bmax 

25 ecbb= £ CbbSpnlngf(bvalbb,bvalb) 

bb=l 



30 bmu 

ctb= £ Cbbspnlngf(bvalbb.bvalb) 

Because ctt» is weighted by the signal energy, it must be renormalized to cb^ 

ecbb 

At the same time, due to the non-normalized nature of the spreading function, ecbb should be renornialized 
and the normalized energy en^, calculated. 

ecbt» 
en^ = — 

40 ^ rnoriTV 

The normalization coefficient, rnorm), is: 

I 



momib^ bm« 77 
^ £ sprdflgf(bvalbb.bvalb) 



7. Convert ebb to tb^. 

so tbb= -.299-.43log,(cbb) 

Each tbb Is limited to the range of OStb^^l. 

8. Calctdate the required SNR in each partition. 

TMNfc « 19.5 ♦ bva»bi|4 
2t>.0 

NMTb 3 6.56 . bvalb|g 

Where TMN^ is the tone masking noise in dB and NMT^ is the noise masking tone value in dB. 
The required signal to noise ratio* SNRb. is: 

SNRb « tbbTMNb ♦ (1 - tbb)NMTfc 
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10 



15 



20 



25 



30 



35 



9. Calculate the power ratio. 

The power ratio. bCb, is: 

10. Calculation of actual energy threshold, nbt». 

nb^ sen^bCfr 

11. Spread the threshold energy over MOCT lines, yielding nb„ 

nb "bb 

<Dhighi>-«lowb+ 1 

12. Include absolute thresholds, yielding the final energy threshold of audibOity^thr. 

thr« c max(nb».absthr«). 

The dB values of absthr shown in the 'Absolute Threshold Tables* are relative to the level that a sine wave 

of t r Isb has in the MOCT used for threshold calculation. 
2 

The dB values must be converted into the energy domain after considering the MDCT normalization ac- 
tually used 

13. Pre-echo control 

14. Calculate the s^nal to mask ratios. SMf^. 
The table of "Bands of the Coder" shows 

1 . The index, n. of the band. 

2. The upper index, ohighn of the band n. The tower index, 6)low«,. is computed from the previous band as 
uhighft. t*1. 

To further classify each band, another variable is created. The width index, width„. will assume a value 
widthn = 1 if n is a perceptually narrow band, and widthn = 0 if n is a perceptually wide band. The former case 
occurs tf 

bvaUig^- bval«to«/bandlength 
bandlength is a parameter set in the initialization routine. Othervtnse the latter case Is assumed. 
Then, if (widthn = 1). the noise level in the coder band, nband„ is calculated as: 

" a)highn-a>low„+l 



40 



else. 

nbandrt = minimum(thr«io*^ thr^wgoj 



Where, in this case, minimum(d z) Is a function returning the most negative or smallest positive argu- 

' ment of the arguntents a...z. 
The ratios to be sent to the decoder. SMRn. are calculated as 

SMf^ = 10.log,o(^^^'^*"'^^"^y^) 

45 ^ ^'**minimum(absthi)' 

U is important to emphasize that since the tonality measure is the output of a spectrum analysis process, 
the analysis window has a sine form for all the cases of large or shcvt segnnents. In particular, when a segment 
is chosen to be coded as a START or STOP window, its tonality information is obtained considering a sine win- 
dow; the remaining operations, e.g. the threshold calculation and the quantization of the coefficients, consider 
50 the spectrum obtained with the appropriate window. 



STEREOPHONIC THRESHOLD 

The stereophonic threshold has several goats. It is known that most of the time the two channels sound 
55 'alike*. Thus, some correlation exists that may be converted in coding gain. Looking into the temporal repre- 
sentation of the two channels, this correlation is not obvious. However, the spectral representation has a num- 
ber of interesting features that may advantageously be exptoited. In facL a very practical and useful possibQity 
Is to create a new basb to represent the two channels. This basis involves two orthogonal vectors, the vector 
SUM and the vector DIFFERENCE defined by the following linear combination: 
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FsumI ^ 1 [i i] [rightI 

[diF J 2 [1 -iJ'[lEFT J 



These vectors, which have the length of the window being used, are generated in the frequency domain 
since the transform process is by definition a linear operation. This has the advantage of stmplifying the conv 
10 putationat load. 

The first goal Is to have a more decorrelated representation of the two slgrtats. This concentration of most 
of the energy in one of these new channels is a consequence of the redundancy that exists between RIGHT 
and LEFT channels and on average, leads always to a coding gain. 

A second goal is to correlate the quantization noise of the RIGHT and LEFT channels and control the lo- 
ts calization of t he noise or t he unmasking effect This problem arises if RIGHT and LEFT chan nels are quantized 
and coded independently. This concept Is exemplified by the following context supposing that the threshold 
of masking for a particular signal has been calculated, two situations may be created. Rrst we add to the signal 
' an amount of noise that corresponds to the threshold. If we present this same. signal wfth this same noise to 
the two ears then the noise is ntasked. However* tf we add an anrtount of noise that corresponds to the thresh- 
20 old to the'signal and present this combination to one ear do the same operation for tlie other ear but with 
noise uncorrelated with the previous one, then the noise is not masked. In order to achieve masking again, 
the noise at both ears must be reduced by a level given by the masking level differences (MLO). 

The unmasking problem may t>e generalized to the fdtowing form: the quantization noise is not masked 
(f it does not follow the localization of the masking signal. Hence, in particular, we may have two limit cases: 
25 center locatizatton of the signal with unmasking more noticeable on the sides of the listener and side locals 
zation of the signal with unmasking more noUpeabte on the center line. 

The new vectors SUM and DIFFERENCE are very convenient because they express the signal localized 
on the center and also on both sides of the listener. Also, they enable to control the quantization noise with 
center and side image. Thus, the unmasking problem is solved by controlling the protection level for the MLD 
30 through these vectors. Based on some psychoacoustic inforn^tion and other experiments and results, the 
MLO protection is particularly critical for very low frequencies to about 3KHz. It appears to depend only on the 
signal power and not on its tonality properties. The following expression for the MLO proved to gh/e good re- 
sults: 

^ MLO«(i) = 25.5(cos^P 

where / is the partition index of the spectrum (see [7]). and b(l) is the barfcfrequency of the center of the partition 
/. This expression is only valid for b(i) £ 16.0 /.e. for frequencies below 3KHz. The expression for the MLD 
threshold is given by: 

THR«.o(f) = C(i)10- ^ 
C(l) Is the spread signal energy on the basilar membrane, corresponding only to the partittori /. 
A third and last goal is to take advantage of a particidar stereophonic signal image to extract brelevance 
from tf/recftons of the signal that are masked by that image. In principle, this Is done only when the stereo 
image is strongly defined in one directton, in order to not compromise the richness of the stereo signal. Based 
on the vectors SUM and DIFFERENCE, this goal is implemented by postulating the following two dual princi- 
ples: 

1 . If there is a strong depression of the signal (and hence of the noise) on t>oth skies of the listener, then 
an increase of the noise on the mMdte line (center image) is perceptually tolerated. The upper bound is 
the side noise. 

2. If there is a strong localization of the signal (and hence of the noise) on the middle line, ther; an increase 
of the (correlated) noise on both sides is perceptually tolerated. The upper bound is the center noise. 
However, any increase of the noise level must be corrected by the MLD threshold. 
According to these goals, the final stereophonic threshold is computed as follows. First, the thresholds 

for channels SUM and DIFFERENCE are calculated using the monophonic models for noise- masking-tone 
and tone*masking-noise. The procedure is exactly the one presented in 3.2 til) step 10. At this point we have 
the actual energy threshold per band, nb^ for both channels. By convenience, we call them THRnsum and 
THRnoiF. respectively for the channel SUM and the channel DIFFERENCE. 

Secondly, the MLD threshold for both channels i.e. THRn^ts^M and THRnuLojotr. are also calculated by: 

THf^MiaSUM ~ €nt^,suJ^410' ,0 
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The MLO protection and the stereo Irrelevance are considered by computing: 

nthrsuM - MAX(THRnsuM « MINfTHRnoiF . THRnMLOJ»F)i 
nthrotF = MAXfTHRnoiF , MIN(TKRnsuM . THRnMu>.suM)l 
After these operations, the remaining steps after the 1 1 th, as presented in 3.2 are also taken for both chan- 
nels. In essence, t hese last thresholds are further adjusted to consider the absolute threshold and also a partial 
premasking protection. It must be noticed (hat this premasking protection was simply adopted from the mono- 
phonic case. It considers a nrtonaural time resolution of at>out 2 milliseconds. However the binaural time reso- 
lution Is as accurate as 6 microsecondsl To conveniently code stereo signals with relevant stereo Image based 
on interchannet time differences, is a subject that needs further investigation. 



STEREOPHONIC CODER 



The simplified structure of the stereophonic coder is presented in FIG. 12, For each segment of data being 
analyzed, detailed information about the independent and relative behavior of both signal channels may be 
available through the information given by large and short transfornrw. This information is used according to 
the necessary number of steps needed to code a particular segmerit Thes9 steps involve essentially the se- 
lection of the analysis window, the def IntUon on a band basis of the coding mode (R/L or S/D), the quantization 
and Huffman coding of the coefficients and scale factors and f inafly. the bitstream composing 



Coding Mode Selection 

When a new segment is read, the tonality updating for targe and short analysis windows is dona. Mono- 
phonfc thresholds and the PE values are calculated according to the technique described In Section 3,1 . This 
gives the first decision atiout the type of window to be used for both channels. 

Once the window sequence Is chosen, an orthogonal coding decision Is then considered. It involves the 
choice between Independent coding of the channels, mode RIGHT/LEFT (R/L) or joint coding using the SUM 
and OIFFERENCE channels (S/D). This decision is taken on a band basis of the coder. This is based on the 
assumptkm that the binaural perception is a functton of the output of the same critical bands at the two ears. 
If the threshold at the two channels Is very different, then there is no need for MLD pioteclfon and the signals 
will not be more decorrelated if the channels SUM and DIFFERENCE are considered. If the signals are such 
that they generate a stereo image, then a MU) protection must be activated and additional gains may be ex- 
ploited by choosing the S/0 coding mode. A convenient way to detect this latter situation is by comparing the 
monophonic threshold between RIGHT and LEFT channels. If the IhreshoWs in a particular band do not differ 
by more than a predefined value, e.g. 2dB, then the S/0 coding mode is chosen. Otherwise the independent 
mode R/L is assumed. Associated which each band is a one bit flag that specifies the coding mode of that 
band and that must be transmitted to the decoder as side infornration. >From now on it Is caUed a coding mode 
flag. 

The coding mode decision Is adaptive In time since for the same band it may differ for subsequent seg- 
ments, and Is also adaptive Jn frequency since for the same segntent the coding mode for subsequent bands 
may be different /Vn Qlustration of a coding decision is given in FIG. 13. This illustration is valid for long and 
also short segments. 

At this point it is dear that since the window switching mechanism Involves only monophonic measures, 
the maximum number of PE measures per segment is 10 (2 channels • (1 large window ♦ 4 short windows]). 
However, the maximum number of thresholds that we may need to compute per segment is 20 and therefore 
20 tonaUty measures must be always updated per segment (4 channels • Jl targe window ♦ 4 short windows]). 



BUrate Adjustment 

It was previously said that the decisions for window switching and for coding mode selection are orthogonal 
in the sense that they do not depend on each other. Independent to these decisions is ^so the final step of 
the coding process that Involves quantizaUon, Hufftrtan coding and tiitstream composing; /.e. there is no feed- 
back path. This fact has the advantage of reducing the whole coding delay to a minimum value (1024/48000 
= 21.3 milliseoonds) and also to avoid instabilities due to unorthodox coding situations. 

The quantization process affects both spectral coefficients and scale factors. Spectral coefficients are 
clustered in bands, each band having the same step size or scale factor. Each step size is directly computed 
from the masking threshold corresponding to its band, as seen in 3.2, step 14. The quantized values, which 
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are integer numbers, are then oonverted to variable word length or Huffman codes. The total number of bits 
to code the segment considering additional fields of the bitstreanu is computed Since the bttrate must be kept 
constant, the quantization process must be iteratively done tQI that number of bits is within predefined limits. 
After the number of bits needed to code the whole segment, considering the basic masking threshold, the de- 
gree of adjustment is dictated by a buffer control unit This control unit shares the deficit or credit of additional 
bits among several segments, according to the needs of each one. 

The technique of the bitrate adjustntent routine is represented by the f towchart of FIG. 9. It may be seen 
that after the total number of available bits to be used by the current segment is computed, an iterative pro- 
cedure tries to find a fador a such that if ail the initial thresholds are multiplied by this factor, the final total 
number of bits is smaller then and within an error 6 of the available number of bits. Even if the approximation 
curve is so hostile that a is not found within the maximum number of iterations, one acceptable solution is al- 
ways available. 

The main steps of this routine are as foOows. First an Interval including the solution Is found. Then, a loop 
seeks to rapidly converge to the solution. At each iteration, the besf solution is updated. 

In order to use the same procedure for segments coded with large and short windows. In this latter case, 
the coefficients of the 4 short windows are clustered by concatenating homologue bands. Scale factors are 
clustered In the same. 

The bitrate adjustment routine calls another routine that ooniputas the to&l number of bits to represent.all 
the Huffman coded words (coefficients and scale factors). This latter routine does a specb^um partioning ac- 
cording to the amplitude distribution of the coefficients. The goal is to assign predefined Huf frnan code books 
to sections of the spectrum. Each section groups a variable number of bands and Its coefficients are Huffman 
coded with a convenient book. The limits of the section and the refSerence of the code book must be sent to 
the decoder as side information. 

The spectrum partioning is done using a minintum cost strategy. The main steps are as follows. First all 
possible sections are defined -the limit is one section per band- each one having the oode book that best match- 
es the amplitude distribution of the coefficients within that section. As the beginning and the end of the whole 
spectrum is known, if K is the number of sections, there are K-1 separators between sectk>ns. The price to 
eliminate each separator is computed. The separator that has a tower price is eliminated (initial prices may be 
negative). Prices are computed again before the next Iteration. This process is repeated tfll a maximum allow- 
able number of secttons is obtained and the smallest price to eliminate another separator is higher than a pre- 
defined value. 

Aspects of the processing accomplished by quantizer/rate-loop 206 in FIG. 2 wfli now be presented. In the 
prior art rate-loop mechanisms have contained assumptions related to the monophonic case. With the shift 
f^om monophonic to stereophonic perceptual coders, the demands placed upon the rate-loop are increased. 

The inputs to quantizer/rate-toop 206 In FIG. 2 comprise spectral coeffidents (i.e.. the MOCT coefficients) 
derived by analysis f Bter bank 202, and outputs of perceptual model 204, including calculated thresholds cor- 
responding to the spectral coeffidents. 

Quantizer/rate-loop 206 quantizes the spectral informatten based, in part on the calculated thresholds 
and the absolute thresholds of hearing and in doing so provides a bitstream to entropy coder 208. The bitstream 
indudes signals divided into three parts: (1) a first part containing iM standardized side information; (2) a seo- 
ond part containing the scaling factors for the 35 or 56 bands and additional side Information used for so-called 
adaptive-window switching, when used (the length of this part can vary depending on information In the first 
part) and (3) a third part comprising the quantized spectral coeffidents. 

A 'utaized scale factoi*, A. is fterativety derived by tnterpdating between a calculated scale factor and a 
scale fador derived from the absolute threshold of hearing at the frequency corresponding to the frequency 
of the respective spectral coeffident to be quantized untfl the quantized spectral coeffidents can be encoded 
within permissible limits. 

An PlustraUve embodiment of the present invention can be seen in FIG. W, As shown at W01 quantizer/rate- 
toop recehres a spectral coefficient and an energy threshdd. E, corresponding to that spectral coeffident 
A "threshold scale factor*. Ao is calculated by 

Ao = Vl2E 

An 'absolute scale factor*. Aa, is also calculated based upon the absolute threshold of hearing (le., the quietest 
sound that can be heard at the frequency corresponding to the scale factor). Advantageously, an interpolation 
constant a, and Interpolation bounds a,^ and ou^ are initialized to aid in the adjustment of the utflized scale 
factor. 

a«ab = 1 
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Next as shown in W05. the utilized scale factor is determined from: 

A = Ao* Aa^' • 

5 Next, as shown in WOT. the utUfzed scale factor is itself quantized because the utnized scale factor as com- 
puted above is not discrete but is advantageously discrete when transmitted and used. 

A = Q-HQ(A)) 

Next, as shown in W09. the spectral coefficient is quantized using the utilized scale factor to create a "quan* 
tized spectral coefficient* Q(C^A). 

Q(Q,A) = N(NT{3) 

A 

where *NINT* is the nearest integer function. Because quantizer/rate loop 206 must transmit both the quantized 
spectral coefficient and the utilized scale factor, a cost, C, ts calculated which is associated with how many 
bits it will take to transmit them both. As shown in RG. W11. 

fS C = FOO(Q(CfrA),Q(A)) 

where FOO is a function which, depending on the specific embodiment, can be easily determined by persons 
having ordinary skill In the art of data communications. As shown in W13. the cost. C is tested to determine 
whether it is in a permissible range PR. When thooost is within the permissible range. Q(Cf,A) and Q(A) are 
transmitted to entropy coder 208. * 

20 Advantageously, and depending on the relationship of the cost C to the permissible range PR the inter- 
polation constant and bounds are adjusted until the utilized scale ^ctor yields a quantized spectral coefficient 
which has a cost within the permissible range. Illustratively, as shown In FIG. W at W13, the interpolation 
bounds are manipulated to produce a binary search. Specifically, ' 
when C > PR, a(,^,h = a. 

25 alternately. 

when c < PR, Otow - a. 
In either case, the interpolation constant is calculated by: 

an ^ 

30 The process then continues at W05 iteratively until the C comes within the permissible range PR. 
STEREOPHONIC DECODER 

The stereophonic decoder has a very simple structure. Its main functions are reading the Incoming bit- 
35 stream, decoding all the data, inverse quantizatfon and reconstruction of RIGHT and LEFT channels. The tech- 
nique is represented In FIG. 12. 

Illustrative emtwd&nents may comprise digital signal processor (DSP) hardware, such as the AT&T DSP16 
or DSP32C. and software performing the operations discussed below. Very large scale integration (VLSI) hard- 
ware embodiments of the present invention, as well as hybrid DSP/VLSil embodiments, may also be provided. 



Claims 

1. A method of coding a d^ital input signal to provide a coded digital output signal, the method comprising 
^ the steps of: 

sampling the digital input signal to create a frame of 2N input signal samples; 
analyzing the frame of signal samples with an odd-frequency fast Fourier transform to provkle a 
frame of 2N Fourier coefficients; and 

outputting a coded signal comprising samples X(k), each sample X(k) provided by multiplying the 

^ real part of a Fourier coefficient. F(k), by cos(^2!L±JX!-L±Qj. 
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0 DCT/IDCT processor and data processing method. 
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0 A one-dimensional discrete cosine transform (DCT) processor of N (N: positive integer>-term input data X 
includes a preprocessing section (1) for carrying out addition and subtraction of (i)th-tenn data x (i) and (N - l)th- 
term data x (N - 1) of input data X, and a unit (2. 3) for perfonming a product sum operation for sets of 
Intennediate data subjected to preprocessing by addition and sets of intermediate data subjected to preproces- 
sing by subtraction, respectively. The product sum operation unit includes a data reanranging unit (2) for 
outputting, in parallel and in order, bit data of the same figure of a set of data, a partial sum generator (41) for 
generating a partial sum by using the parallel bit data as an address, and an accumulator (42) for accumulating 
outputs of the partial sum generator. A one-dimensional inverse discrete cosine transform (IDCT) processor of N- 
term input data X includes a unit (2. 3) for performing a product sum operation of input data, and a 
postprocessing section (7) for carrying out addition and sut>traction of 2-term data in a predelenmined 
combination of an output of the product sum operation unit The number of times of multiplication Is reduced by 
utilizing inherent characteristics of coefficients of DCT/IDCT processing. Since the product sum operation is 
performed by a ROM table (43) and an adder (44). a faster multiplication is realized. 
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The present invention relates generally to data processors and data processing methods and. more 
particularly, to an apparatus and method for canrying out discrete cosine transform or inverse cosine 
transform of data. 

In order to process video data at a high speed, high effective coding is carried out. In high effective 
coding, a data amount of a digital video signal is compressed with picture quality being maintained as high 
as po^ble. In high effective coding, a redundant component of the signal is first removed for efficient 
coding. For this purpose, orthogonal transform techniques are often employed. As one of the orthogonal 
transfonn techniques, discrete cosine transform DCT is provided. The DOT is implemented by a simple 
product sum operation using a cosine function as a coefficient. The DCT is defined by the following 
expression (1): 



15 



Y = AX (1) 

where X is an N-term column vector indicating input data, Y is an N-tenm column vector indicating output 
data, and A is N by N coefficient matrix represented by the following expression. 



20 



25 




The expression (1) represents a case where input data X is of N terms. 2^ points are generally 
30 employed, where m is a natural number. A description will now be made on 8 point DCT where N = 8 (m 
= 3). As can be seen from the expression (1). DCT is a matrix operation, and in practice, this processing is 
realized by product sum operation. 

Rg. 1 shows configuration of a conventional DCT processor. This DCT processor is described in. for 
example. IEEE. Proceedings of Custom Integrated Circuits Conference 89, 1989. pp. 24.4.1 to 24.4.4. 
35 ReferringTo Fig. 1, the conventional DCT processor includes eight sum product operation units 100a to 
lOOh anranged in parallel for calculating respective temns yO to y7 of output data Y. 

Each of product sum operation units 100a to lOOh is of the same configuration and includes a parallel 
multiplier 101 for taking a product of input data xi (i = 0 to 7) and a predetermined weighting coefficient, 
and an accumulator 102 for accumulating an output of parallel multiplier 101 to generate output data yj 0 = 
40 0 to 7). Here, reference characters 101 and 102 generically denote respective components 101a to lOlh 
and 102a to 102h. In the following description also, reference numerals having no suffixes generically 
denote corresponding elements. 

Accumulator 102 includes a 2Hnput adder 103 for receiving an output of parallel multiplier 101 at its one 
input, and an accumulating register 104 for latching an output of adder 103. An output of register 104 is 
45 applied to an output terminal 106 and also to the other input of adder 103. Data yj of the respective terms of 
output date Y are sequentially output through a selector not shown from output terminal 106. An operation 
will now be described. 

Identical data are appfied through an input terminal 105 to product sum operation units 100a to lOOh. 
The following arithmetic operation is carried in each of product sum operation units 100a - lOOh: 

50 
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yj = E A(i, j) xi 
i=0 



1 



7 

E C(j) •.(cos 
i=0 



(2i + 1) 



jn) • xi 



2 



16 



(2) 



i, j - 0, 1, ... 7 

For example, data yO of a zeroth term in an output data vector Y is calculated as follows in product sum 
operation unit 100a. 

When receiving zeroth-term data xO (hereinafter referred to simply as input data) in an input data 
vector, parallel multiplier 101a outputs a product A (0. 0)*xO of data xO and a coefficient A (0, 0) to adder 
103a Register 104a is being reset, and the content thereof is 0. Accordingly, product A (0, 0)*xO is output 
from adder 103a and then stored in register 104a. 

When input data xi Is applied, a product A (1. 0)*x1 is output from multiplier 101a An output of adder 
103a is A (0. 0)*xO + A (1 . 0)*x1 and stored in register 104a. 

By repetition of such an operation, an output of accumulator 102a provided after application of input 
data x7 is 



so that output data yO is obtained. 

Similar calculation (which differs merely in values of a weighting coefficient A 0. j)) is carried out also in 
the remaining product sum operation units 100b - lOOh. and output data y1 - y7 are obtained. These output 
data yO - y7 are sequentially output through output tenninal 106. 

In contrast to the DCT operation, there is an inverse DCT operation for carrying out the inverse 
operation of the DCT operation. The inverse DCT (IDCT) is expressed as follows: 



where an input data vector X is obtained from an output data vector Y. That is, only the difference between 
the DCT operation and the IDCT operation is a difference between coefficients A and A". Thus, in the 
configuration of Fig. 1. the IDCT operation can be carried out by changing the coefficients in parallel 
multipliers 101a- 101 h. 

In other words, the DCT and the IDCT can be carried out on the same hardware. An increase in 
hardware is only concerned with a control circuit (not shown) for making a selection between a coefficient 
for DCT and that for IDCT. 

The above-described one-dimensional DCT operation can be expanded to a two-dimensional DCT 
operation. The twcKlimensional DCT operation is obtained by making both input dataarector X and output 
data vector Y be twoKlimensional vectors. 

Rg. 2 shows configuration of a conventional two-dimensional DCT (or IDCT) processor. Referring to Rg. 
2. the processor includes a first one-dimensfonal DCT processing section 111a for subjecting input data 
from input tenninal 105 to one-dimensional DCT processing, a transposition circuit 1 12 for rearranging rows 
and columns of an output of first one-dimensional DCT processing section 111a, and a second one- 
dimenskMial DCT processing section 111b for subjecting an output of transpositkm circuit 112 to one- 
dimensional DCT processing. Rrst one-dimensional DCT processing section 111a perfonns a DCT (or 
IDCT) operation in a row direction, and second one-dimensional DCT processing sectk^n 111b performs a 
DCT (or IDCT) operation in a column direction. 

Rg. 3 is a diagram showing configuration of the transposition circuit of Rg. 2. Refemng to Rg. 3. 
transposition circuit 112 includes a buffer memory 121 and an address generation circuit 122 for generating 
write/read addresses of buffer memory 121. Buffer memory 121 receives output data of first-one- 
dimensional DCT processing section 111a through an input tenninal 125 and sequentially stores the same 
therein in accordance with an address signal from address generation circuit 122. Also, buffer memory 121 



E A (i, 0) • xi. 



X = 
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applies corresponding data from an output tennfiinal 126 to second one-dimensional DCT processing section 
111b in accordance with an address signal from address generation circuit 122. An operation will now be 
described. Input data X and output data Y are two dimensional, the elements of which are each represented 
byx(ij)andy0,j).ij = 0, 1 ...7. . _^ ... 

5 Input data are applied in the order of rows to first one-dimensional DCT processing section 1 11a. More 
specifically, input data are applied to input terminal 105 in the order of Qrterm row vectors x (0. j), x (1. j). ... 

^ ^fHrst one-dimensional DCT processing section 111a performs the DCT operation for each row vector to 
output intennediate data Z. At that time, first DCT processing section 111a outputs intemr^ediate data of row 
10 vectors in the order of rows, i.e.. z (0. j). z (1. 0 .... Accordingly, a DCT operation in the row direction of 
input data X is carried out 

As shown in Rg. 3. transposition circuit 112 first stores the intermediate data from first DCT processing 
section 111a into buffer memory 121 in the order of receiving of the Intermediate data {the order of rows). 
Then, intennediate data Z are read in the order of columns. i.e., the order of column vectors z (i. 0), z (i. 
75 1 ) ... from buffer memory 1 21 . 

Intennediate data Z read in the order of columns are applied to second DCT processing secton 111b. 
Second DCT processing section 111b carries out on the intermediate data one^iimensional DCT process- 
ing Accordingly, data subjected to one-dimensional OCT processing in ttie column direction are output 
from second one^limensional DCT processing section 111b. Output data Y firom second one-dimensional 
20 DCT processing section 111b are output in the order of columns from output tenninal 106. As a result, two- 
dimensional DCT shown by the following equation (3) is performed. 

17 7 {2i + 1) UK 

Yuv = E E C(u) • C(v) • cos • 

4 i=0 j=0 16 



25 



30 



35 



40 



(2j + 1) VK 



COS 



16 



C{u), C(v) =/ 



Xij ... (3) 



(u, V - 0) 



1 (u, V 0) 

Rrst and second DCT processing sections 111a and 111b carry out the same processing except for 
coefficients in the parallel multiplying drcuHs. If multiplication coefficients of first and second DCT 
processing sections 111a and 111b are changed, twcxlimensional IDCT shown by the following equation (4) 
is carried out. 

1 7 7 * (2i + 1) uit 

Xij = E E C(u) • C(v) • cos 



4 u=0 v=0 16 



(2j + 1) vn 

so ^.Qg « , Yuv (4) 

16 



The DCT processing and IDCT processing as shown above Include a product sum operation. A product 
operation of tfiis product sum operation is carried out by the parallel multipliers shown in Rg. 1. A multiplier 
in general requires a large number of adders and the like and has a large scale. Thus, ttiere is a 
disadvantage ttiat a conventional DCT processor requiring a plurality of parallel multipliers is not allowed to 
be sized-down. 
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In a semiconductor integrated circuit for carrying out a synchronization operation, the upper limit of 
operation speed is determined by a worst delay path (the path which provides a maximum delay). In the 
conventional configuration, the worst delay path is established by a parallel multiplier, and the operation 
speed depends on processing speed of the parallel multiplier. It is thus difficult to implement a fast OCT 
processing and a fast IDCT processing. 

One object of the present invention is to provide a down-sized data processor which operates at a high 

speed. 

Another object of the present invention Is to provide a data processor for carrying out at least one of 
OCT and IDCT at a high speed. 

A further object of the present invention is to provide a data processing method for carrying out at least 
one of DCT and IDCT at a high speed. 

A data processor according to the present invention reduces the numt>er of times of multiplication by 
utilizing characteristics inherent to a DCT operation or IDCT operation. A product sum operation is canned 
out by a successive operation employing a combination of a memory and an adder. 

Since the number of times of multiplication is reduced and no parallel multipliers are employed, the 
DCT operation and IDCT operation are carried out at a high speed with fewer circurt components. 

The foregcM'ng and other objects, features, aspects and advantages of the present invention will become 
more apparent from ttie following detailed description of ttie present invention when taken in conjunction 
with tfi(3 accompanying drawings. 

Rg. 1 is a diagram showing configuration of a conventional one-dimensional DCT processor. 

Rg. 2 is a diagram showing configuration of a conventional two-dimensional DCT processor. 

Rg. 3 is a diagram showing configuration of a transposition circurt of Rg. 2. 

Rg. 4 is a diagram showing configuration of a one-dimensional DCT processor being one embodiment 
of the present invention. 

Rg. 5 is a diagram showing an example of configuration of a preprocessing section shown in Rg. 4. 
Rg. 6 is a diagram showing an example of modification of tiie preprocessing section of Rg. 5. 
Rg. 7A is a diagram showing an example of configuration of a data rearranging circuit of Rg. 4. 
Rg. 7B is a diagram showing the contents of a shift register of Rg. 7A. 

Rg. 8 is a diagram showing an example of configuration of a product sum operation circuit of Rg. 4. 

Rg. 9 is a diagram of an example of modification of the product sum operation circuit of Rg. 8. 

Rg. 10 is a diagram showing an example of modificatiooDf the one^imensional DCT processor of Rg. 

Rg. 11 is a diagram showing-configuration of a one-dimensional IDCT processor being another 
embodiment of tho present invention. 

Rg. 12 is a diagram showing configuration of a one-dimensional DCT/IDCT processor being still anotfier 
embodintent of ttie present invention. 

Rg. 13 is a diagram showing configuration of a two-dimensional DCT processor being still another 
embodiment of ttie present invention. 

Rg. 14 is a diagram showing configuration of a twtxiimensional IDCT processor being still anotfier 
embodiment of the present invention. 

Rg. 15 is a diagram showing configuration of a twcKJimensional DCT/IDCT processor being still anotfier 
embodiment of the present invention. 

Rg. 16 is a diagram showing configuration of a semiconductor integrated drcuit apparatus including ttie 
DCT processor of the present invention. 

Rg. 17 is a diagram showing an example of modification of the semiconductor integrated circuit of Rg. 

16. 

Rg. 4 schematically shows configuration of a one-dimensional DCT processor being one embodiment 
of the present invention. 

fteferring to Rg. 4. tfie processor includes a preprocessing section 1 for receiving input data xi from an 
input tenninal 4 to preprocess ttie received input data xi on ttie basis of characteristics inherent to DCT 
operation, a data rearranging circuit 2 for rearranging data output from preprocessing section 1, and a 
product sum operation section 3 for carrying out a product sum operation on data from data rearranging 
circuit 2. 

This processor carries out an eight-point DCT operation. "Rius, product sum operation section 3 
includes eight product sum operation circuits 6a - 6h. Respective product sum operation circuits 6a - 6h 
provide respective output data yO, y2. y4, y6, y1. y3, y5 and y7 to sequentially apply ttie output data to an 
output terminal 5, (the sequential application unit is not shown in the figure). 

A description will now be made on tfie principle of an Srpoint one-dimensional DCT processing 
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operation of the present invention before a detailed description of configuration of each section. If the 
relationship between input data xi (i = 0. 1. ... 7) and output data yj (j = 0. 1. 7) shown in equations (1) 
and (2) is expressed in a matrix form, the following representation (5) is obtained: 
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where 

25 A = Jcos;.B = Jcos5,C = Jsin5.D=icosT5, 
E=Jcos?5.F = Jsin?g.G = 2sinfc 

In derivation of the above relation (5). the well-known characteristics of trigonometric function such as 
cos w/4 = 1/V2, cos (it ± fl) = - cos 0. cos (ir/2 ± 0) = sin and the like are utilized. For example, cos 
30 (3ir/8) = sin (ir/8) and the like are utilized. 

In the above relation (5). a coefficient matrix is horizontally symmetrical with.respect to columns. By use 
of this symmetry, relation (5) can be transformed to the following representation (6): 
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If a comparison is made between the above relations (5) and (6). it is apparent that the number off times 
55 of multiplication for acquiring output data yj is reduced to a half in relation (6) as compared to relation (5). 
DCt processing in accordance with relation (6) is carried out in this embodiment 

With reference to Rg. 4. preprocessing section 1 generates the following eight intennediate data from 
input data xi sequentially applied from input tenninal 4 by selectively carrying out addition or subtraction. 
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The intermediate data are: 



(xO + x7), (x1 + x6). (x2 + x5). (x3 + x4), 
(xO - x7). (x1 - x6). (x2 - x5) and (x3 - x4). 

5 

The result of preprocessing from preprocessing section 1 is represented in finite word length. In the 
following description, it is assumed that the preprocessing result is indicated by 8-bit data in two's 
complement notation. 

In order to calculate output data yj by using the preprocessed data from preprocessing section 1. the 
10 nwtrix operation of relation (6) is carried out. 

With respect to output data y2, for example, the following relation (7) is canied out: 



y2 = B • (xO + x7) + C • (xl + x6) 

75 

- C • (x2 + x5) - B (x3 + x4) 
4 

E Bfc • Z|, ' • • • ( 7 ) 

20 k=l 



where 



25 Bi = B, B2 = C. Ba = -C. = -B 

2, = {xO + X7), Z2 = (x1 + x6) 
Z3 = (x2 + x5). = (x3 + x4) 

Data rearranging circuit 2 of Rg. 4 receives preprocessing results Zk (k = 1. ... 4) from preprocessing 
30 section 1. When receiving four necessary preprocessing results Zk, data rearranging circuit 2 outputs the 
least significant bits of respective four preprocessing results In parallel to product sum operation section 

3. The parallel bH output is carried out sequentially in bit figure until the most significant bit is output 

Product sum operation circuit 6a for output data y2 carries out an operation in accordance with a 
relation (8) which is a further equivalent transformation of relation (7). 



7 r 4 1 4 

y2 = E E Bj, • 2"' + E B», • (-Zi„) (8) 

n=l^k=l k=l 

40 

where z^ is nth-bit data of preprocessing result z^. and zw is the most significant bit of a,. That is. z^ <0j7> 
= (Zfco. Zki, ... Zit7). Data zk is obtained in the following relation (9): 

45 

1 ^ 

Zk = -Zko + E Zta, • 2"*" (9) 
n=l 

50 

It Should be noted that data Zk is data of 8 bits represented in two's complement notation. Therefore, 
equations (7) and (8) are mathematically totally equivalent to each other except for a difference in order of 
product sum operations. 

4-bit data z,„. Z2„. Z3„ and Z4„ are applied in parallel to product sum operation circuit 6a for data y2. The 
55 values of coefficients B1. B2. B3 and B4 can be calculated in advance. Product sum operation circuit 6a 
stores therein a partial sum (10) shown below in the form of a ROM table and outputs a corresponding 
partial sum with A4M data Zi„, Z2„, Z3„ and Z4„ used as an address. 
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4 

E Bfc • Zfa, (n = 0, • . . 7) ... (10) 

k=l 

5 

This partial sum is accumulated by an internal accumulator, so ttiat output data y2 is output from temiinal 5. 
Although the sign of the most significant bit bw is negative, this sign can be converted to be positive for 
addition operation in two's complement notation. 

Referring to Rg. 4. in product sum operation section 3. product sum operation circuits 6a - 6d apply the 
10 operation to the same data in parallel to produce output data yO. y2, y4 and y6. Product sum operation 
circuits 6e - 6h apply the operation to the same data wk in parallel to produce output data y1. y3. y5 and 

Output data yO to y7 are sequentially output In this order from output terminal 5 by a selector not 
shown. A description will now be given on a detailed configuration of each section shown in Rg. 1. 

15 Rg. 5 shows configuration of preprocessing section 1 shown in Rg. 4. Preprocessing section 1 includes 
an input circuit 21 for receiving input data xi sequentially applied from input terminal 4. Input circuit 21 
outputs input data xp and xq in a predetermined combination under control by a control circuit 25. Here, a 
relation p + q = 7 is satisfied. Input circuit 21 can be formed of a tapped shift register. Data at a desired 
stage can be read by selecting a tap under control by control circuit 25 by employing, for example, a 

20 multiplexer. 

Preprocessing section 1 further includes a 2-input adder 22 for adding outputs of input circuit 21, a 
subtractor 23 for subtracting outputs of input circuit 21. and an output circuit 24 for selecting one of 
respective outputs of adder 22 and subtractor 23 under control by control circuit 25. Adder 22 and 
subtractor 23 carry out addition and subtraction for the applied data under control by control circuit 25, 
25 Output circuit 24 preferably alternately selects the output of adder 22 and that of subtractor 23. An 
operation will now t»e descrit>ed. 

Input circuit 21 receives input data X to sequentially output sets of data (xO, x7), (x1, x6). (x2, x5), and 

(x3, x4). 

Adder 22 adds the data of each set. Adder 22 sequentially outputs data Zk. i e- (xO + x7). (x1 + x6), 
30 (x2 + x5) and {x3 + x4). 

Subtractor 23 sequentially outputs data Wn. i.e., (xO - x7). (x1 - x6). (x2 - x5) and (x3 - x4). 
Output circuit 24 alternately outputs data and data w^. 

Parallel multiplication circuits 6a to 6d of Rg. 1 carry out an operation in accordance with data z^, while 
parallel multiplication circuits 6e to 6h carry out an operation in accordance with data wr. 
35 Output circuit 24 alternately outputs addition data z^ and subtraction data w^. This makes it possible to 
produce output data yO to y7 in this order from product sum operation section 3 and implement a pipelined 
architecture for processing data in synchronization with a clock signal. 

In that case, it is unnecessary that adder 22 and subtractor 23 canry out an arithmetic operation 
simultaneously. Accordingly, as shown in Rg. 6. an arithmetic unit 26 for altemalely perfomiing the adding 
40 processing and the subtracting processing under control by control circuit 25 may be employed. Output 
circuit 24 does not have to have a selecting function in the configuration of Rg. 6. Output circuit 24 is 
required to have a function of buffering and latching (in the case of a ckx* synchronizing operalwn) an 
output of arithmetic unit 26. In the configuration of Rg. 6. since the addition and subtraction are carried out 
in a single arithmetic unit 26, the circuit scale is reduced. 
45 Such configuration may be employed that intemnediate data Wk is output after all intermediate data Zk 
are output from preprocessing section 1. 

Rg. 7A shows configuration of data rearranging circuit 2 of Rg. 1 . Data reanranging circuit 2 includes an 
Input circuit 31 for receiving intermediate data from a terminal 500, a shift register 32 for sequentially 
storing therein data from input circuit 31, and a selector 33 for sequentially reading four intermediate data 
50 stored in shift register 32 from the least significant bit 

After alternately receiving intemiediate data Zk and intemiediale data Wk and outputting all of intemnedi- 
ate data Zk in advance, input circuit 31 sequentially outputs intermediate data Wk. This configuration can 
easily t>e implemented by using a register for storing intennediate data Wk therein. 

When intermediate data Wk are applied after all of intermediate data Zk are applied, input circuit 31 
55 sequentially outputs intermediate data from termirial 500. Input circuit 31, however, has a hinction of 
latching intennediate data Wk until tiie r^ing of intermediate data Zk by the selector is completed. A data 
acceptation, latching and output operation of input circuit 31 is controlled by a control circuit 34. 

Shift register 32 stores therein four intemiediate data from input circuit 31. Shift register 32 includes 
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four SrbW registers 32a-32d in a row direction as shown in Rg. 7B. Fig. 7B shows a state where four 
intermediate data zi to are stored in shift register 32. 

Intermediate data from input circuit 31 are sequentially stored in registers 32a - 32d. After all 
intemiediate data z1 to z4 are stored in register 32, data of registers 32a - 32d are read in parallel 
5 sequentially from the respective least significant bits. Such configuration can be implemented by shift 
registers capable of shifting in both row and column directions. Even by use of a shift register capable of 
shifting only in the row direction, if a register stage is selected by selector 33. the data rearranging 
operation can t>e realized. 

A data bit shifting operation of shift register 32 is controlled by control circuit 34. Selector 33 reads data 

10 of 4 bits in parallel from shift register 32 under control by control circuit 34. 

Four-bit data Z|(„ are output from a terminal 501 in the configuration of Fig. 7A. - 
Rg. 8 shows configuration of product sum operation circuit 6. Referring to Rg. 8. product sum operation 
circuit 6 includes a partial sum generating circuit 41 for generating a partial sum in accordance with data 
from terminal 501. and an accumulator 42 for accumulating an output of partial sum generating drcuit 41. 

75 Partial sum generating circuit 41 includes an ROM (Read Only Memory) 43 for receiving 4-bit data 
and as an address signal. ROM 43 stores the partial sum shown in, for example, equation (10) in the form 
of table and. when supplied with 4-bit data z^. ROM 43 outputs a conresponding value. By constructing this 
partial sum generating circuit 41 in the fonm of the ROM table, a partial sum can be generated at a high 
speed without any multiplication. * 

20 Accumulator 42 Includes an adder 44 for receiving a partial sum from partial sum generating arcuit 41 
at its one input, an accumulating register 45 for storing an output of adder 44, and a shifter 46 for shifting 
an output of register 45 by predetermined bits to apply the shifted output to the other input of adder 44. 
Output data yj is applied from shifter 46 to terminal 5. A description will now be made on an operation 
tiiereof. taking output data y2 as an example. 

25 Four-bit data Zkn are applied in turn from the least significant bit to product sum generating circuit 41. 
Product sum generating circuit 41 sequentially outputs a partial sum 
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from ROM 43. 

Rrst, a partial sum 
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k=l 



55 



is stored in register 45. 
Then, a partial sum 

4 

k=l 

is output from partial sum generating drcu'it 41 . 

Shifter 46 shifts the contents of register 46 by one bit. Accordingly, an output of shifter 46 is shown as 

t>elow: 

4 

E 
lc=l 



Tlie output of adder 44 is shown as below: 
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4 4., 
E Bk • + (E Bk • Zw) • 2 

k=l k=l 

5 

By sequentially repeating this operation, the following output (11) is stored in register 45. 
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7 4 

E [E Bfc • Zfa^] • 2"° • • • (11) 

n=l k=l 

H data Zko is applied, the contents of register 45 is the value shown in relation (8) since the data 
represented by the alxjve relation (11) is shifted by one bit and then added by adder 44. After that, the 
shifting operation by shifter 46 Is stopped and the contents of register 45 is read, whereby output data y2 is 
obtained. 

The operation of partial sum generating circuit 41 and accumulator 42 is carried out by control circuit 

47. * . . ^ 

Rg. 9 shows another configuration of a product sum operation circuit. The product sum operaton circuit 

shown in Rg. 9 is different from the configuration shown in Rg. 8 in that a partial sum generating circuit 41 

includes two ROMs 43a and 43b and an adder 48 for adding outputs of ROMs 43a and 43b. 

ROM 43a receives higher order bitS; while ROM 43b receives lower order bits. In this configuration. 

partial sums P and Q shown in the following equations are made by ROMs 43a and 43b. 

2 
k=l 
4 

Q = E Bt • 

k=3 



40 



45 



In the configuration of Rg. 9. the number of words to be stored into the ROMs is drastically reduced. Tliis is 
because the number of words to be stored is determined by the number of bits of an address signal and 
increased in proportion to two's power of the t)it numl)er. 

In the above configuration, product sum operation section 3 includes eight product sum operation 
circuits 6a - 6h. Intermediate data and Wk are not calculated simuHaneously. When rnterniediate data w^ 
are calculated after all intermediate data are calculated and then output data yO. y2, y4 and y6 are 
calculated, product sum operation section 3 can be fornied of four product sum operation circuits 6a - 6d as 

shown in Rg. 10. ^ ^ _4 -7 

Product sum operation circuits 6a - 6d calculate yO and y1, y2 and y3, y4.and y5, and y6 and 7, 
respectively. The contents of an ROM for partial sum generation is changed in accordance with intermedi- 
ate data and w^. If the ROM is structured in bank architecture, the change of the coefficient table can 

easily be realized. n * 

A description will now be made on a structure for an IDCT operation with reference to Rg. 1 1 . Refenring 
to Rg 11 an frpoint one^Jimensional IDCT processor includes a data rearranging circuit 2 for reanranging 
data from an input tenninal 4. a product sum operation section 3 for performing a production sum operation 
in accordance with an output of data reanranging circuit 2. and a postprocessing section 7 for carrying out 
addition and subtraction of a predetennined combination of outputs of product sum operation section 3. 

Data rearranging circuit 2 and postprocessing section 7 are of the same configurations as those of 
rearranging circuit 2 and preprocessing section 1 of Rg. 4. respectively. An operation will now be 
55 descrit)ed. 

Input data yi G = 0. 1, ... 7) applied to tenninal 4 undergoes an IDCT processing, so that output data xi 
(i = 0. 1. ... 7) is transmitted to tenninal 5. The relationship between data yj and xi is represented in the 
following matrix form (12). 
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(12) 



20 



where 



A = Jcos5.B = Jcosi.C = isin5.D = JcosTS. 
1 



E = Jcos^.F = Jsin?5.G = 2sin?B 

TOs coefficient matrix is a transposed matrix of the coefficient matrix of equation (5). If the symmetry with 
25 respect to rows of the coefficient matrix of expression (12) is utilized, expression (12) is changed to the 
following equivalent expression (13). 
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(13) 



ft should be noted that there are only two types of ttie coefficient matrix of expression (13). Assume that 
these two types are M and N. The processor shown in Rg. 1 1 carries out an IDCT operation in accordance 

50 with expression (13). ^ ^ a 

Data rearranging circuit 2 receives data yj (j = 0. 1, ... 7) from terminal 4 to rearrange data yO, y2. y4 
and y6 and sequentially output the rearranged date from the least significant bit. That is. 44>it date yon. Yzn. 
y4n and yen (n = 0, 1, «. 7) of input date yj are output from date rearranging circuit 2. so that generation and 
accumulation of partial sums are canied out As intermediate data, the following date is output 
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7 4 

n=l k=l 



4 

k'=l 



This result corresponds to, for example, an Intermediate term M2 = (A • yO - C • y2 - A • y4 + B • y6) 
for )c2. Then, data y,„. y3„. ysn and yyn (n = 0. 1, ... 7) are output from data rearranging circuit 2. The bit 
data is subjected to a product sum operation in product sum operation section 3. Accordingly, the 
remaining terms are obtained. For example, an intermediate term N2 = (F*y1-D*y3 + G*y6 + E* 
y7) for x2 is obtained. Intennediate terms Mi 0 = 0, 1. ... 7) and Ni (i = 0, 1, ... 7) are output in turn from 
product sum operation section 3. From expression (13), the following relations are satisfied: Mi = Mr^. and 
Ni = 

Data rearranging circuit 2 may artemately output data bits (yO. y2. y4. y6) and data twts (y1. y3, y5. y7). 
Each of product sum operation circuits 6a - 6d calculates Mi (= M74). and each of product sum operation 
circuits 6e-6h calculates Ni ( = N7 J. 

Postprocessing section 7 obtains a sum of and a difference between intermediate data Mi and Ni to 
generate output data xi and output the same to terminal 5. Accordingly, the following relation is obtained: 

xi = Mi + Ni 0=0. 1.2. 3) 
xi = Mi - Ni (i = 4. 5. 6, 7) 

Postprocessing section 7 has the same configuration as that of Fig. 5 or 6. In that case, input circuit 21 
sequentially or alternately receives intemnediate terms Mi 0 = 0 to 3). Ni 0 = 0 to 3) to apply a desired 
combination of the terms to adder/subtractors 22. 23 (or 26). The order in which data are selected in the 
input circuit is made by control circuit 25. In this case, data may be applied in the order of xO, x7. xi, x6. 
x2. x5. x3. x4 to output circuit 24. and output circuit 24 may output the data in the order of xO. xi. ... x7. 

This oneKlimensional HX5T processor can also be stmctured such that a single product sum operation 
circuit 6 calculates both intermediate terms Mi and Ni (i = 0 to 3). 

The product sum operation in DCT processing and that in IDCT processing are identical to each other 
except for their coefficient matrixes. Accordingly, as shown in Fig. 12. a processor capable of selectively 
perldnming the DCT processing and the IDCT processing is obtained. 

Refening to Fig. 12. the processor includes a preprocessing section 1, a data rearranging circuit 2. a 
product sum operation section 3, a postprocessing section 7 and a control circuit 8. 

Preprocessing section 1 has its input connected to an input terminal 4 and its output connected to data 
rearranging circuit 2. Data rearranging circuit 2 has its output connected to product sum operation section 3. 
Product sum operation section 3 has its output connected to an input of postprocessing section 7. The 
output of postprocessing section 7 is supplied through an output terminal 5. Product sum operation section 
3 includes first to eighth product sum operation drcuits 6a - 6h. 

Control circuit 8 switches DCT operation and IDCT operation and also controls the operation of the 
respective circuits. 

A description will now be made on an operation of the processor shown in Fig. 12. In the case of DCT 
processing, data is allowed to go intactly through postprocessing section 7. This causes the processor of 
Fig. 12 to function equally to the DCT processor shown in Fig. 4. That is. data input from input terminal 4 
undergoes addition/subtraction in preprocessing section 1 and then rearranged in data reanBnging circuit 2. 
The reananged data is then transmitted in turn from lower order bits to the product sum operation section. 
The data subjected to a product sum operation shown in. for example, expression (5) in the product sum 
operation section passes through postprocessing section 7 and is then directly output from output terminal 
5. 

In the case of inverse DCT processing, data passes intactly through preprocessing section 1. whereby 
the processor functions equally to the inverse DCT processor shown in Fig. 1 1 as follows. That is, the data 
input from input terminal 4 passes intactiy through preprocessing section 1 and then rearranged in data 
rearranging circuit 2. The rearranged data is transmitted in tum from lower order bits to the product sum 
operation section. The data subjected to the product sum operation in the product sum operation section is 
transmitted to postprocessing section 7 and then subjected to addition/subtraction for calculating output 
data. The added/subtracted data is output from output terminal 5. 

Changes in coefficients in product sum operation section 3 are made by control circuit 1. This is easily 
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realized by switching of banks of ROM or tfie like. 

The above-described processor performs a one-dimensional DCT or IDCT operation. This processor 
can be developed to be able to perform a two<llmensional DCT or IDCT operation. 

Rg. 13 shows configuration of a two-dimensional DCT processor according to the present invention. 
Refening to Fig. 13. the two-dimensional DCT processor includes a first one-dimensional DCT processing 
section 11a. a second one^limensional DCT processing section lib and a transposition circuit 12. 

Rrst. one-dimensional DCT processing section 11a canies out DCT processing with respect to rows, 
while s^d one-dimensional DCT processing section lib canies out DCT processing with respect to 
columns. Transposition circuit 12 outputs in the order of columns the data applied in the order of rows. Rrst 
and second processing sections 11a and lib have the same configuration as that of the one-dimensional 
DCT processor shown in Rg. 1 and include a preprocessing section 1 (la, lb), a data rearranging circuit 2 
{2a, 2b) and a product sum operation section 3 (3a, 3b). A description will now be made on a two- 
dimensional DCT processing of 8 x 8 points taken as an example. 

If expression (3) is rewritten, the following expression (14) is obtained: 

7 7 
Yuv = E A(j, V) • (E A(i, u9 • xij) ... (14) 

j=o i=0 



20 



Input terminal 4 is supplied with input data in the order of rows. That is, 8rterm row vector data x(0, i), 
x(1 . j) ... X (7. j) 0 = 0. 1. ... 7) are applied in turn. 

Preprocessing section la carries out preprocessing for the respective row vector data. For a zeroth row, 
25 for example, data (xOO ± x07), (xOI ± x06), (x02 ± x05) and {x03 ± x04) are output from preprocessing 
section la. Data rearranging circuit 2a reanranges four words (four addition data or four subtraction date) to 
output the rearranged data to product sum operation section 3a. Product sum operation section 3a applies a 
product sum operation to the applied data. The processing operation of date rearranging circuit 2a and 
product sum operation section 3a is the same as those of the processor shown in Rg. 4. 
30 Accordingly, first one^imensional DCT processing section 11a outputs in the order of rows 8-tenn row 
vector date Rk subjected to one-dimensional DCT processing witii respect to a row direction. Rk is an 8r 
term row vector of Rk = (RkO. Rkl, ... RkT), where k = 0, 1, ... 7. 

This intermediate date Rk Is applied to transposition circuit 12 and stored therein in the order of rows. 
When 8-row intennediate date RO - R7 are stored in transposition circuit 12, transposition circuit 12 outputs 
35 intennediate date in the order of columns to second one-dimensional DCT processing circuit lib. The 
intermediate date stored in transposition circuit 12 is date which is subjected to an operation processing 
with respect to "i" in expression (14). Transposition drcuit 12 outputs intennediate date in the order of 
columns. In the zerotfi column, for example, date ROO. RIO, R20. ... R70 are read in turn. 

Second onedimensional DCT processing section lib carries out the same preprocessing, the same 
40 date rearranging processing and the same product sum operation processing for each column as those of 
first one-dimensional DCT processing section 11a. Accordingly, second one^limensional DCTT processing 
section lib outputs date subjected to addition with respect to "j" in expression (14). That is, Orterm column 
vector date are output in the order of columns from output terminal 5. The date appearing on ou^Mrt 
terminal 5 are date subjected to onerdimenstonal DCT processing in bolh row and column directions, i.e., 
45 two^imensional DCT processing. 

Uke the transposition in the circuit shown in Rg. 2, transformation from rows to columns in transposition 
drcuit 12 is realized by changing an address of a buffer memory in the row direction in date writing and in 
the column direction in date reading. 

Also, the tw<xIimensional IDCT processing can be realized by expanding the one^imensional IDCT 
50 processor shown in Rg. 1 1 . Rg. 14 shows configuration of a two-dimensional IDCT processor. 

Referring to Rg. 14, the twoKfimensional IDCT processor indudes a first oneKlimensional IDCT 
processor 13a and a second one-dimensional IDCT processor 13b. 

Rrst, one<limensional IDCT processor 13a indudes a date rearranging drcuit 2a. a produd sum 
operation section 3a and a postprocessing section 7a. Second one-dimensional IDCT processor 13b 
55 indudes a date reananging drcuit 2b. a produd sum operation section 3b and a postprocessing section 7b. 
Both first and second on^imenskmal IDCT processors 13b and 13b carry out the same processing as that 
of the oneKlimensional IDCT processor shown in Rg. 11. 

Input tenninal 4 is supplied with input date in the order of rows. Rrst IDCT processor 13a canies out an 
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IDCT processing with respect to rows. 

Transposition circuit 12 sequentially stores therein intermediate data applied in the order of rows from 
first IDCT processor 13a and outputs the stored intermediate data in the order of columns. 

Second IDCT processor 13b carries out an IDCT processing for the respective columns. Accordingly, 
output terminal 5 is supplied with the data subjected to the IDCT processing in both row and column 
directions, i.e., two-dimensional IDCT processing, in the order of columns. 

Rg, 15 shows configuration of a two-dimensional DCT/IDCT processor being still another embodiment 
of the present invention. The processor of Fig. 15 includes a first one-dimensional DCT/IDCT processor 14a. 
a second onenlimensional DCT/IDCT processor 14b. and a transposition circuit 12 provided between 
processors 14a and 14b. 

Rrst and second processors 14a and 14b are of the same configuration as that of the processor shown 
in Rg. 12 and include a preprocessing section 1 (la, lb), a data rearranging circuit 2 (2a. 2b). a product 
sum operation section 3 (3a. 3b) and a postprocessing section 7 (7a, 7b). 

In the configuration of Rg. 15, like the configuration shown in Rg. 12. if preprocessing sections la and 
lb arid postprocessing sections 7a and 7b are selectively set in a through state and coefficients (used in 
the partial sum generation circuit) of product sum operation sections 3a and 3b are selected, then two- 
dimensional DCT and IDCT processings can selectively be carried out. 

An operation of the processor of Rg. 15 is identical to those of the processors of Rgs. 13 and 14. One 
processing mode of the DCT processing and the IDCT processing is set by a control circuit not shown 
(conresponding to control circuit 8 of Rg. 12). 

Although the foregoing description has not been concerned with implementation forms of the DCT 
processors, the use of the above^lescribed configuration makes it possible to easily incorporate all of DCT 
(inverse DCT) functions integrally on a semiconductor integrated circuit. 

It is also possible to incorporate alt of the above<lescribed DCT/inverse DCT functions integrally on a 
semiconductor integrated circuit and simultaneously incorporate functional circuitry having functions other 
than ttie DCT/inverse DCT functions integrally on one semiconductor substrate. Rg. 16 shows an example 
of use of a DCT processor which is incorporated integrally on one semiconductor substrate simultaneously 
with other functional circuitry. 

Referring to Rg. 16. a semiconductor integrated circuit (semiconductor chip) 50 includes a DCT 
processor 51 and functional circuits 52, 53 and 54. 

DCT processor 51 has such configuration as shown in Rg. 13 or 17. Functional circuits 52. 53 and 54 
have different functions A. B and C. respectively. In application to video data processing, ftjnctions A. B and 
C include such functions necessary for image compression as quantization, variable length coding (entropy 
coding) and the like. The functions necessary for image compression are standardized by, for example. 
JPEG (Joint of Photographk: Expert Group). 

In the configuration shown in Rg. 16, DCT processor 51 is used in cooperation with (or in clock 
synchronization with) functional circuits 52, 53 and 54. 

In the embodiment shown in Rg. 16, the functional circuits integrated together with the DCT processor 
are dedicated circuits having specific functions. The functional circuits are not limited to such dedicated 
drcutts and may be integrated together with a mteroprocessor or a programmable functional btock 56 such 
as a DSP (Digital Signal Processor) as shown in, for example, Rg. 17. Further, the DCT processor may be 
integrated together with a dedicated functional circuit 55 and programmable functional block 56 in 
combination as shown in Rg. 17. 

TTie summary of principal techntea! effects of the present invention is as foltows: 

CO Since the required number of times of multiplication is reduced by preprocessing in DCT processing 

or by postprocessing in IDCT processing, load on a product sum operation circuit is reduced. 

(it) Since a product sum operation is carried out by a memory and an adder, the scale of circuitry is 

substantively reduced. 

(iii) Because of the above item (ii). a parallel multiplication circuit is unnecessary. Accordingly, when the 
entire processor performs a synchronizing operation, a higher operation speed on a worst delay path is 
easily achieved, facilitating a faster processing. 

fw) Since the effect of the above item (ill) facilitates an upgrading of a DCT (or IDCT) processor, this 
effect is greatiy advantageous particulariy in implementation of the present DCT (or IDCT) processor on 
a semiconductor integrated circuit, together with the effect of reducing the circuit scale. 
Although ttie present invention has been described and illustrated in detail, it is clearty understood tfiat 
the same is by way of illustration and example only and is not to be taken by way of lim'rtation, the spirit 
and scope of the present invention being limited only by the terms of the appended daims. 
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Qaims 

1. A processor having at least a function of carrying out one-dimensional discrete cosine transform (DCT) 
of N-tenn input data X wherein said N is a positive integer, said processor comprising: 

preprocessing means (1; la. lb) for carrying out addition and subtraction for each of sets of 
predetermined two terms of said input data X to generate a first set of addition data and a second set 
of subtraction data; and 

matrix product means (2. 3) for obtaining a first matrix product of said first set of data from said 
preprocessing means and a predetermined first coefficient matrix, and a second matrix product of said 
second set of data and a predetermined second coefficient matrix, wherein an output of said matrix 
product means provides Nrtemi output data subjected to DCT processing. 

2. The processor of claim 1, wherein 

said preprocessing means includes: 

set generating means (21) for generating a set of pth term data X(p) and qth term data X(q) of said 
input data X. where p + q = N-1. 0 ^ p < q ^ N-1, and p and q are an integer; 

addition means (22; 26) for carrying out addition of 2-temi data output from said set generating 
means; and 

subtraction means (23, 26) for canying out subtraction of the 2-term <lata output from said set 
generating means. 

3. The processor of claim 1 or 2, wherein 

said matrix product means (2, 3) includes 

storage means (32) for sequentially receiving said first set of data from said preprocessing means 
(1) to store the received data therein, each of said first set of data having a plurality of bits, and 

parallel reading means (33) for reading, in parallel and in order, one-bit data in the same bit figure 
of all of said first set of data stored in said storage means. 

4. The processor of one of claims 1 to 3, wherein 

said matrix product means (2, 3) includes 

storage means (32) for sequentially receiving said second set of data from said preprocessing 
means (1) to store the received data tfierein, each of said second set of data having a plurality of bits, 
and 

parallel reading means (33) for reading in parallel and in order, one-bit data in the same bit figure 
of all of said second set of data stored in said storage means. 

5. The processor of claim 3 or 4, wherein 

said matrix product means (2, 3) further includes 

a plurality of product sum operation means {6a - 6d), and wherein each said product sum operation 
means includes 

table memory <neans (43) for recehnng parallel bit data from said parallel reading means (33) as an 
address signal to output a corresponding partial sum. said table memory means (43) storing in advance 
a product sum of a corresponding coefficient and parallel bit data in a table form, and 

accumulation means (42) tor accumulating outputs of said table memory means (43), said 
accumulation means output providing a first set of output data of said N-term output data. 

a The processor of claim 5. wfierein 

said accumulation means (42) includes 

addition means (44) for receiving an output of said table memory means at its one input, 
register means (45) for temporarily storing an output of said addition means therein, and 
shift means (46) for shifting storage data in said register means by a bit to apply the shifted data to 
the other input of said addition means. 

7. The processor of one of claims 1 to 6. wherein 

said matrix product means (2. 3) further includes 

a plurality of product sum operation means (6e - 6h; 6a - 6d) for generating a second set of said N- 
term output data, and wherein 

each said product sum operation means includes 
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table memory means (43) for receiving parallel bit data from said parallel reading means (33) as an 
address signal to output a conresponding partial sum, said table memory means (43) storing in advance 
a product sum of a corresponding coefficient and parallel bit data In a table form, and 

accumulation means (42) for accumulating outputs of said table memory means to generate a 
second set of data of said N-term output data. 

8. The processor of one of claims 5 to 7» wfierein 

said accumulation means (42) includes 

addition means (44) for receiving an output of said table memory means at its one input. 

register means (45) for temporarily storing an output of said addition means, and 

shift means (46) for shifting storage data in said register means by a bit to apply the shifted data to 

the other input of said addition means, a final output of said shift means providing said second set of 

data of said N-term output data. 

9. The processor of one of claims 1 to 8, further comprising: 

a postprocessing section (7) for receiving an output of said matrix product means to carry out 
addition and subtraction of predetermined 2-tenn data of the received N-term data and generate first 
and second sets of output data; and 

control means (8) for enabling one of said preprocessing section (1) and said postprocessing 
section. 

10. The processor of claim 9, wherein 

said postprocessing section (7) includes means (22, 23; 26) for carrying out addition and 
subtraction of (2i)th-term data Y (2i) and (2i + 1)ttiy-term data Y (2i + 1) of N-term output data Y of 
said matrix product means (2, 3) wherein said i is an integer of 0 ^ i ^ N/2 - 1. 

11. The processor of claim 10, vrfierein 

the addition of said data Y (2i) and Y (2i + 1) indicates 0)th-term output data Z fi), and the 
subtraction of said data Y (2i) and Y (2i + 1) indicates (N - i - 1)th-term output data Z (N - i - 1). 

12. The processor of one of claims 1 to 1 1 . further comprising: 

transposition means (12) for sequentially receiving output data of said matrix product means (2, 3) 
to store the receded data ttierein, transpose a matrix formed by the stored data and sequentially output 
N-term intermediate data; 

second preprocessing means (lb) having the same configuration as that of said preprocessing 
means, for receiving an output of said transposition means to canry out addition and subtraction for 
each of predetermined 2-term sets of said N-term intermediate data; and 

second matrix product means (2b, 3b) having the same configuration as that of said matrix product 
means, for performing a product operation of output data of said second preprocessing means and a 
predetermined second coefficient matrix, an output of said second matrix product means indicating 
data subjected to twcMiimensional OCT processing. 

ia The processor of one of claims 9 to 12, further comprising: 

second postprocessing means .(7b) having the same configuration as that of said postprocessing 
means, for receiving an output of said second matrix product means, and 

second control means (8) for enabling one of said second preprocessing means (lb) and said 
second postprocessing means. 

14, The processor of one of claims 1 to 13, wherein 

said processor is incorporated integraledly in an integrated circuit (50) with ftjnctional drcuitry (52, 
53. 54; 55. 56). 

15- The processor of one of claims 1 to 14, wherein 
said N is 8. 

ia A processor having et least a function of carrying out one^inr\ensional inverse discrete cosine transform 
(DCT) of N-term input data Y. wherein said N is a positive integer, said processor comprising: 

matrix product means (2. 3) for dividing said N-term input data Y into a first set of input data and a 
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second set of input data and carrying out a product operation of said first set of input data and a first 
coeffident matrix and a product operation of said second set of input data and a second coefficient 
matrix, to generate a first set of intermediate data Mi and a second set of intermediate data Ni. wfierein 
said i is an integer of 0 ^ i ^ N/2 - 1; and 

postprocessing means (7; 7a, 7b) for carrying out addition and subtraction of two intermediate data 
in a predetermined relationsfiip in said first set of intermediate data and said second set of intermediate 
data from said matrix product means to generate first and second sets of output data Xi. 

The processor of claim 16. wfierein 

said postprocessing means (7; 7a, 7b) includes means (22, 23; 26) for carrying out addition and 
subtraction of said first set of (i)th-term intermediate data Mi and said second set of {i)tfi-term 
intermediate data Ni; and 

addition data (Mi + Ni) indicates (i)th-term data of N-term output data, and subtraction data (Mi - 
Ni) indicates (N - i - 1)th-term data of said N-temn output data. 

The processor of claim 16 or 17, wherein 

each said intermediate data is represented by a plurality of bits, and 
said matrix product means (2, 3) includes 

storage means (32) for dividing said N-term input data Y into a first se* of input data Y (2i) and a 
second set of input data Y (2i + 1 ) to store each set of the input data therein, 

first reading means (33) for reading in parallel one-bit data in the same order of said first set of all 
input data Y (2i) from said storage means, 

second reading means (33) for reading in parallel one-bit data in the same bit figure of said second 
set of all input data Y (2i + 1) from said storage means. 

first product sum operation means (6a - 6d) for carrying out a product sum operation of parallel bit 
data from said first reading means and a con^esponding coefficient of said first coefficient matrix, to 
generate said first set of output data, and 

second product sum operation means (6e - 6h) for carrying out a product sum operation of parallel 
bit data from said second reading means and a corresponding coefficient of said second coefficient 
matrix, to generate said second set of output data 

19. The processor of daim 18. wherein 

said first and second product sum operation means include a plurality of operation drcuits each 
related to one term of said output data, each said operation drcuit means (6a - 6h) including 
35 table memory means (43) for receiving parallel bit data as an address signal to output the result of 

the product sum operation with the corresponding coeffident, said table memory means storing in 
advance data indicating the result of the product sum operation in a table fonm, and 
accumulation means (42) for accumulating outputs of said table memory means. 

40 2a The processor of daim 19. wherein 

said accumulation means indudes 

2-input addition means (44) for receiving an output of said table menwry means (43) at its one 
input, 

register means (45) for temporarily storing an output of said addition means, and 
45 shift means (46) for shifting storage data in said register means by a bit to apply the shifted data to 

the other input of said addition means, a final output of said shift means indicating output data of an 
associated term. 

21. The processor of one of daims 16 to 20, further comprising: 

50 preprocessing means (1; la, lb) for canrying out addition and subtraction of a predetermined set of 

2-term data Y (j), Y (N - j - 1) of said N-term input data Y to generate a first set of addition data and a 
second set of subtraction data, said first set of said aldition data and said second set of said 
subtraction data being applied as said first and second sets of input data to said matrix product means; 
and 

55 control means (8) for enabling one of said preprocessing means and said postprocessing means. 

22. The processor of one of claims 16 to 21, further comprising: transposition means (12) for sequentially 
receiving N-term N output data from said postprocessing means (7; 7a, 7b) to store the received data 
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therein, then transpose the stored data and output the transposed data; 

second matrix product means (2b. 3b) of the same configuration as that of said matrix product 
means, for receiving an output of said transposition means; and 

second postprocessing means (7b) of the same configuration as that of said postprocessing means, 
for receiving an output of said matrix product means. 

an output of said second postprocessing means indicating data subjected to two-dimensional IDCT 
processing of N by N points. 

23. The processor of one of claims 16 to 22. wherein 

said N is 8. 

24. The processor of one of claims 16 to 23, wherein 

said processor (51) is incorporated integratedly in an integrated circuit (50) so as to operate in 
cooperation with other functional circuitry (52, 53, 54; 55, 56). 

25. A method of processing one dimensional discrete cosine transform of N points. X. wherein said N is 2™, 
said m t>eing a natural numt)er. said method comprising the steps of: 

carrying out addition and subtraction of each 2-term data in predetermined relationship in said input 
data X, to generate a first set of addition data and a second set of subtraction data, said first and 
second sets including N/2-term data; 

carrying out a product operation of said first set of data and a first coefficient matrix to generate a 
first set of output data; 

carrying out a product operation of said second set of data and a second coefficient matrix to 
generate a second set of output data; and 

outputting said first set of output data and said second set of output data in a predetermined order. 

26. The n^thod of claim 25, wherein 

said step of generating said first and second sets of output data includes the step of generating a 
corresponding partial sum by reference to a table memory, using an applied data as an address signal. 

27. The method of daim 25 or 26, wherein 

said 2-term data in said predetermined relationship are (i)tti-term data x (i) and (N - i -1)th-term 
data x (N - i - 1). wherein said i is an integer of 0 ^ i ^ - 1. 

2& A method of carrying out one-dimensional inverse discrete cosine transform of N points, wherein said N 
is 2^, said m t>eing a natural number, said method comprising the steps of: 

receiving N-term input data Y to generate a first set of input data of even-term data Y (2i) and a 
second set of input data of odd-term input data Y (2i + 1). wherein said i is an integer of 0 ^ 1 ^ N/2 - 
1; 

carrying out a product operation of said first set of input data and a first coefficient matrix to 
generate a first set of intermediate data M (i); 

carrying out a product operation of said second set of input data and a second coefficient matrix to 
generate a second set of intermediate data N (i); 

carrying out addition and subtraction of said first set of intermediate data M 0) and said second set 
of intermediate data N 0) to generate a first set of addition data and a second set of subtraction data; 
and 

outputting said first set of said addition data and said second set of said subtraction data in a 
predetermined order. 

29. The method of daim 28. wherein 

said addition data is a sum of data M (i) and data N (i). said addition data of M (i) + N (i) indicating 
ith-term data X(i)df N-tenn output data; and 

said subtraction data is a difference between data M (i) and data N (i). said subtraction data of M (i) 
- N (i) providing (N - i - 1)th-term output data X (N - i - 1). 

30. The method of daim 28 or 29, wherein 

said step of generating said intermediate data M (i) and N fi) includes the step of generating a 
corresponding partial sum by reference to a table memory, using the applied data as an address. 
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FIG. 2 
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FIG. 3 
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FIG. 11 

DATA REARRANGING CKT 



/6a 


.6b /6c 
















PRODUCT SUM 
OPERATION CKT 




















0 


2 




6 


1 


3 


5 


7 



POSTPROCESSING SECTION 



5 



29 



EP 0 506 111 A2 
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channel Above (his bound decoding of intensity stereo is applied using the scalefactors of (he right channel 
as intensity stereo positions. An intensity stereo position of 7 in one scalefactor band indicates that (his 
scale£actor band is not decoded as inlensi^ steieo. 



Scalefactor bands t 

I t I 11 1 I I I t . . . It 

|< nonzero.jMrt o£ specccua (right chan) — >|< terojiart of spectrum >| 

l< m/& at X/i stereo coded part '•• >(<- intensity stereo coded part ->| 

For each scalefactor band (sb) coded in intensity steieo, the following steps are executed 

1) the intensity stereo position is^%b is read from the scalefactor of the right channel. 

2) if (is jK)Ssb 7) do not perfonn the following steps (inegal i$_pos). 

3) is_ ratio = ian(is_poSd> 

4) L:-Li* ^'^^^ for ail indices i within the actual scalefactor band sb. - ■ 

' l + i$.nuio 

5) R.:=rL* ! for all indices! within the actual scalefactor band sb. 

' * l + is_faiio 

2.4,3.4.10 Synthesis fllterbank 

Figure A.4. shows a block dtagiam Indudiag the synthesis filteibank. The frequency lines are prcpax:essed 
by the **alias reduction" scheme (see (he block diagranis m in figure A.5 and in table B.9. for the 
coefncients) and fed into the IMDCT matrix, each 18 into one transform block* The Hist half of the output 
values are added to the stored overlap values from the last block. These values are new output values and are 
input values for the polyphase filterbank. The second half of the output values is sU}red for overlap with the 
next data granule. Fbr every second subband of the polyphase filterbank every second input value is 
muldplied by -1 to conea for the frequency inversion of the polyphase filterbank. 

2.4.3.4.10.1 Alias reduction 

For long blocletype granules (block^type != 2) the input to the synthesis filt^bank is processed for alias 
reduction before processing by the IMDCT. The following pseudo code describes die alias reductitm 
computation: 

for (sb=I ; sbc^2; sb++) 
for(i=0;i<:8; i-H^) ( 

xarfl8*sb-Ml = xiflS^sb-l-ilGfi) - xTflS^sbflJCap] 
xar[18*sb+ij = xr[l8*sbfiICyi) + xrf 18*sb-l-i]Cafi] 

} 

The indices oi arrays xarO and xrO label the frequency lines in a granule, arranged in order of lowest 
frequency to highest ftequenc^, with lero being the index of the lowest iirequency Ime. and 575 being the 
index of the highest The ooeflictents: Cs[i] and Ca(i] can be found in (able B.9. Figures A J and A.6 
iDustiate the alias reduction comput^uion. 

Alias reduction is not applied for granules with block Jype = 2 (short block). 

2.4.3.4.10.2 imCT 

In the following, n is the number of windowed samples (for short blocks n is 12, for l<mg blocks n is 36). 
In the case of a block of type ''short", each of the thfee short blocks is transformed separately, n/2 values 
arc transformed to n values Xj. The analyUcal expression of the IMDCT is: 



x, = 2iX»«<K^(2' + l + f)(2*+J)) fori=Olon.l 
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Figure A.4 ^ Layer IH decoder diagram 
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